~ Manual of 


DIGTAL 
ÚBRARES - 


Anil K. Dhiman 
Yashoda Rani 


Volume-l 


ABOUT THE BOOK 


Digital Libraries are getting much popularity 
among the library & information science fraternity. 
The present book on Digital Libraries is an attempt to 
describe various facets of digital library in the present 
scenario. The book is divided in to many chapters - 
Prologue, Concept of Digital Libraries, Digital Libraries 
- Components and Services, Documents in Digital 
Libraries, Internet and Internet Resources, 
Cataloguing of Digital Resources : Metadata and its 
Creation, Digital Preservation, Information Access in 
Digital Libraries, Copyrights and Intellectual Property 
Rights, Library Consortia, E-learning and Digital 
Libraries, Open Access and Institutional Repositories, 
Open Source Software and Epilogue, emphasizing on 
challenges of Digital Libraries and their future. Two 
appendices on Links and Resources; and Internet 
Companies, Library Automation Vendors and 
Information Organizations are also provided. 

It is hoped this book will serve its purpose and will 
find its place among the teaching community, students 
of library & information science and the working 
community comprising of library and information 
scientists. 


Dr. Anil Kumar Dhiman 
129, Vidya Vihar Colony 
Bhairon Mandir Road 
KANKHAL-249 408 (Haridwar) 
Uttarakhand 


MANUAL OF DIGITAL LIBRARIES 
(Vol. 1) 


MANUAL OF 
DIGITAL LIBRARIES 


(Volume l) 


Dr. Anil Kumar Dhiman 
129, Vidya Vihar Colony 
Bhairon Mandir Road 
KANKHAL-249 408 (Haridwar) 
Uttarakhand 


Dr. Anil Kumar Dhiman 
and 
Smt. Yashoda Rani 


089.9143 


185073 


central Library 


p 


Ess Ess Publications 
New Delhi 


oi rT 
W ai f G ) 


Manual of Digital Libraries (Set of Two Volumes) 
Copyright © by Anil K. Dhiman & Yashoda Rani 


All rights reserved. No part of this book may be reproduced in any 
form or by any electronic or mechanical means including information 
storage and retrieval systems without permission in writing from the 
publisher, except by a reviewer, who may quote brief passages in a 
review. 


While extensive effort has gone into ensuring the reliability of informa- 
tion appearing in this book, the publisher makes no warranty, express 
or implied on the accuracy or reliability of the information, and does 
not assume and hereby disclaims any liability to any person for any loss 
or damage caused by errors or omissions in this publication. 


ISBN: 978-81-7000-656-5 (Set) 
ISBN: 978-81-7000-654-1 (Volume-I) 
ISBN: 978-81-7000-655-8 (Volume-II) 


Rs.3950/- (Per Set) 


First Published 2012 


Published by: 

Ess Ess Publications 

4837/24, Ansari Road, 

Darya Ganj, 

New Delhi-110 002. 

INDIA 

Phones: 23260807, 41563444 
Fax: 41563334 

E-mail: info@essessreference.com 
www.essessreference.com 


Cover Design by Patch Creative Unit 


Printed and bound in India at Salasar Imaging Systems. 


Dedicated to 


Shri Ramlal Dhiman 
(12.1.1935 — 01.1.2010) 


Contents 


VOL. | 

Preface wa vecessccccassesncnsevensocuatvedsee see CeSe Tener eae (xy) 
1 Prologue ..................s0eseneecoenes tec eeeet eee T 1 
1.1. . Electronic Libraries: ET 3 
1.2. Functions and Roles in the Electronic Library .5 
1.3. Virtual Libraries ......../.2...ccesecssenes ener seeaceeeinend 8 
1.4. Foundations and Evolution of Digital Library.. 11 
1.5. Attempts in India for Digital Libraries .............. 25 
2 Digital Libraries - The Concept Elaborated.............. 37 
2.1. Definitions Þe e TE 41 
2.2. Characteristics of Digital Libraries ................. 51 
2.3. Framework for Digital Library ..........-..---------- 55 

2.4. Advantages and Disadvantages of 
Digital braes 94 
2.5. Conventional Versus Digital Libraries .......... 101 
2.6. Digital Library Myths ..........:ccceseeeee TERE 125 
3 Digital Library Components and Services.............. 130 
3.1. Components of Digital Library ..................... 131 
3.2. Access lnfrastructure ..........erereeeereererensseren 187 
3.3. Computer and Network Infrastructure ......... 191 
3.4. Digital Resource Organization .................-+- 200 
3.5. Manpower Training.......e..sssessssrsssessesssereeen 207 


(vill) 


Manual of Digital Libraries 


3.6. Steps Involved in Building Digital Library ...... 208 


3.8. Architectures and Interconnecting ............... 221 
SOM HYBRID ÚiDrarny SeEryiCeS .......-2.------ceoecee evens 228 
Documents and Resources in Digital Libraries ..... 242 
Ai OHM PUN EE 245 
4.2. Online or Virtual Resources ....................006 308 
4.3. Advantages of E-resources ...............cceeeeee 345 
4.4. Disadvantages of e-reSOUICes ................ 345 
ANS MEEI=-TESOUNGCES: ISSUCS <...<...--..00sccecccssccssccscenacs 346 
4.6. Archiving Of e-resources ecese 347 
4.7. E-Resource archiving : Some Issues........... 348 
4.8. Factors of Acquiring Electronic Resources . 349 
4.9. Evaluation of Electronic Resources ............ 355 
4.10. Web site/Internet Resource Evaluation ....... 337/ 
AMER OCCSSION EVaIUAtION ........000.ccsseccessconseconnes 369 
AMO MIMESSCANGIUMG)s..:0:..05--..cocooescccsscccnneecceaes 377 
4.13. Some NeW Resources... 385 
Internet and Internet Resources...............c.ccsseceeees 412 
5.1. What Is Internet & How it has Evolved? .....412 
9.2. Getting Started with Internet......................6. 431 
Ss FESON ME a 434 
5 Smin We MNE 438 
5.5. Internetin Libraries and Information Centres 448 
SL MEEA RESNE EE 457 
SO, MEINERS ‘coccoocen08o0500 COCR E EEE Eee ence EEE eee eee 587 
Cataloguing Digital Resources : ................cccseeeeees 591 
Metadata and Its Creation 

6.1. Challenges to Traditional ................0...sscssees 594 

Cataloguing Practices 
6.2. What Is Metadata? l...a 599 


Contents ( ix) 


6.3. Types of Metadata a 601 
6.4. What Does Metadata DO? ...............:::::eeee 603 
6.5. Metadata Standards ...............00ccc0sceeecessroeers 606 
6.7. Metadata for Datasets ...............cceeeceeeeseeee ees 622 
6.8. Extensions and Profiles..............::e:esseeeeeeeees 623 
6.9. Frameworks for Interoperability ..............-.--. 624 
and Exchange 
6.10. Metadata Crosswalks ...........::cccseeeeeeeeeeeen ees 639 
6.11. Metadata Registries ..................cccsneeeereeeeeees 640 
6.12. Metadata Creation... -a.-r a E e EE 641 
6.13. Methodology for Building UP .............::::::+6+ 644 
Meta Resources 

6.14. Organization of Meta Resources ................ 646 
6.15. Metadata and the Standards Process ......... 651 

6.16. Tools For Building Up A Meta Resource ..... 654 
6.17. Important Meta RESOUICES............:eeeeeeeeee ees 659 
Digital Preservation ............::cccsseeesseeeerseeeeeeeeeeees 669 

7.1. Preservation definitions in the digital world .. 671 

7.2. Principles of Preservation .........--.:::sssssseeees 682 

7.3. Thirteen ways for Digital Preservation ......... 690 
7.4. Approaches for Digital Preservation ............ 712 
7.5. Digitization Policy ........ssseeeesereeserreeserteeeeeeme=t 716 
7.8. Digitization - Points to be Considered ......... 722 
7.9. Data Models....0.......--:022-<ccccseeesteeseeencocs=snennn 730 
7.10. Choosing Software and Hardware .............. 737 
7.11. Digitization Process ..........:::seeseesseeeetereeeees 763 


7.12. Digital archaeology ....... Poccoacecrnosedooccoonsasc0ces 797 


(xX) Manual of Digital Libraries 
VOL. Il 
(REYASD ——noo9e0900000-00 000 000 DC RECO DOSE OEE EEEEE EEE EEE EEE EEE (xy) 
8 Open Access and Institutional Repositories ......... 799 
8.1. Open Access: The Concept and Definition .. 800 
8.2. Open Archives Initiatives-the Mission .......... 803 
8.3. Significance of Open Access ..................000. 809 
8.4. OA Initiatives and E-prints.................. ee 810 
8.5. Open Access Journal - How It Works?....... 833 
8.6. Some Open Access Journal Initiatives ........ 834 
8.7. Trends and Impact of Open Access ............ 837 
8.8. Implication of Open Access for Libraries..... 839 
8.9. Open Access Archives or Open Access ..... 854 
Repositories/Institutional Repositories 
8.10. International Initiatives ON .............. ec eeeeeeeee 860 
Open Access Repositories 
8.11. Open Access Initiatives In India.................... 862 
8.12. Open Access to Electronic Theses.............. 870 
and Dissertations (ETD) 
SMIGMOCIAICMIVIMGec-nes-c-cc2-cecs0cccc0sseesccccecesesceceuans 881 
8.14. Problems and Solutions of OA repositories 889 
8.15. Role of Library Professional .................:.00005 890 
8.16. What Needs to be Done in India? ................ 892 
9 Information Access in Digital Libraries .................. 897 
Dio WEL [PECTS cooposcetiossosco CCU ECO SSE ER E 903 
9.2. Static and Dynamic Page ..........0... eee eee 904 
9.3. Web In Various Platforms .....................00060 904 
9.4. Steps In Designing A Website .................0... 907 
9.5. URL/Internet Address ................0.ccseecensseees 936 
OS Cre SUNE a 937 
e) T NNAS AVNI orae a 938 


Contents ( xi) 


10 


11 


9.8. Setting-up Free Website ...................2cccseeeeee 938 
9.10. Registering Your Domain Name.................. 940 
9.11. Hosting Website... eeaeee eee 940 
9.12. Search Engines for Websites...................6- 941 
9.13. Submission of Website ..............022..0es-cceeere 943 
9.14. Technical Aspects to be Kept in Mind.......... 944 
9:15. Keeping: statistics... aE 947 
9.16. Information Searching - ...............--secsesserreeee 948 
Four-phase framework 
9.17. Search Engines - Further Expanded ............ 954 
9.18. Resource Discovery Network (RDN) - ....... 962 
The eLib Subject-based Information 
Gateways 
9.19. Other Search Tools — -.2.cee-cosecmearemereee meee 966 
For Specific Resources 
9.20. Searching the Internet— Some Tips ............. 970 
9.21. OPACs and Web OPACSiiss se eetnseseeters 972 
Copyrights and Intellectual Property Rights .......... 977 
10.1.Copyright:..::2...2. c.cce.2 occa enneereneeee neater 981 
10.2. Conflicts over Intellectual Property .............. 996 
10.3. Intellectual Property Rights (IPRs) ............. 1006 
10:4; Patent: .:58.c. 5 N eee eens 1026 
10:5. Trade@marksyt.ciiiiietesceceesteceesconsmaee totems 1026 
10.6. Trade Secrets: n.. e a ESE 1027 
10.7. Open Access and IPRS .........:::ccceeesseeeeees 1028 
10.8. Ethics of Encyption and Inscription ........... 1034 
10.9. Role of Libraries in Copyrights and IPRs ... 1037 
bra fC onsa 1041 


11.1. Library Cooperation: International Scene... 1042 
11.2. Library Cooperation: Indian Scene ............ 1043 


(xi) 


12 


13 


Manual of Digital Libraries 


11.3. Need for Library Cooperation .................... 1044 
Wlk HIEN ONSE] o 1045 
rS Cremna 1104 
11.6. Role of National Library in .....................008 1189 
Library Consortia 
E-Learning and Digital Libraries «0.0.0.0... 1193 
12.1. E-learning Terminology ....................0c0cce00e 1196 
122. DESION E-L@ALMING) a 1198 
12.3. Common Benefits of E-learning ................ 1206 
12.4. Growth of E-learning and its...................02. 1211 


Implications for Libraries 
12.5. Libraries and Shareable Online Resources 1214 


12.6. E-Learning and libraries .................ccseceeeeeee 1219 

12.7. E-learning Technologies in Libraries.......... 1222 

12.8. Components of E-learning systems .......... 1225 

12.5. The Role of Digital Libraries in E-learning .. 1231 

12.10. E-learners, Digital Libraries...................... 1233 
and Information Utilization 

12.11. Digital Libraries and E-learning ................ 1236 


Linkage; Institutional Concerns 
12.12. E-learners’ Expectations From Librarians 1239 
12.13. E-learners and Digital Library Resources 1241 


12.14. Digital Library Reference Services ........... 1243 
12.15. Digital Library User Instruction ................. 1245 
He ZmOMm MOBIC Learning -nerenin 1247 
12.17. Challenges and Opportunities .................. 1252 
AS S ==] CONMINGIIMN MGA ©. ....cc-ccce-ccceceweccocseceeeeeee 1258 
Open Source Software for Creating .................:. 1263 
Digital Library 

13.1. What is Open Source Software................. 1264. 


Contents (xii) 
13.2. OSS and Libraries e sreeeeee eee 1277 
13.3. Digital Library Sofware 1279 

14 Epilogue .............1:.....tesceecnecssonee 0d tema etal eee aca 1332 
14.1. Digital Library Initiatives...........-...::::eeeeeeeeees 1340 
14.2. Barriers to Digital Library Initiatives ............ 1344 
14.3. Latest Development ................cceeceeeeeeeeeeees 1352 
14.4. RSS Feeds oes 1362 
14.5. Web 2.0 and Library 2! One er eo oe sceseeee 1366 
14.6. Libraries as learning laboratories............... 1408 

Appendices .....000c.ceecscssnsess 70s lee eset aeie naar area EEE 1413 
As Links and Resources ircrcresseesete ess ce eoree 1413 
2 Internet Companies, Library ......---.--22-------- 1421 


Automation Vendors and 
Information Organizations 


Bibliography 21 .erecccecccessooceccssssencacecenraneenteasasse cena aeneraceae 1434 
Nala e A ao eee 0000000 C00C000000000000 1450 


Preface 


The libraries have been the backbone of the institutions 
since the initiation of education in the modern world. But with 
the development of information and communication 
technology, like other fields, library field is also going under 
tremendous change, as a result of which Digital Libraries are 
coming into existence. However, Digital Libraries are not fully 
developed yet they are in the phase of development. Rather 
so called Hybrid Libraries, which are the combination of 
traditional and turning Digital Libraries, are in existence and 
we have print, migrating print to digital world and born digital 
resources in our libraries. Nevertheless, we are on the move 
of more and more Digital Libraries in our country. 


This book on Digital Libraries is an attempt to describe 
the overall fundamentals of Digital Libraries and their working. 
The whole book is divided into 14 chapters. Prologue, the 
first chapter gives an overview on foundations and evolution 
of Digital Libraries and their initiatives in India and the world. 
The concept of Digital Libraries is further elaborated in second 
chapter of the book. It is elaborated by various definitions on 
Digital Libraries and framework on Digital Libraries. The 
advantages and disadvantages of Digital Libraries are also 
discussed in this chapter. A comparison of Conventional 
versus Digital Libraries is also made along with the discussion 
on myth about Digital Libraries. 


Digital Libraries Components and Services forms third 
chapter of the book. Various components of Digital Libraries, 
viz., collection infrastructure, access infrastructure, digital 


(xvi) Manual of Digital Libraries 


resources organization and manpower training are discussed 
along with some important functional components of Digital 
Libraries. Subject gateways, portal, vortal and virtual library 
components are also defined and elaborated in the chapter. 
A discussion on fundamental Digital Library services is also 
made here. Documents whether they are in print form, non- 
print form or available in online mode, form the major part of 
any library. Documents in non-conventional form like, optical 
media comprising of CD, DVDs, and e-books and e-journals 
and databases are described along with their online 
surrogates in next chapter on Documents in Digital Libraries. 
A discussion on their evaluation is also made with new 
emerging resources like wiki, blogs & weblogs and facebooks 
etc., in this chapter. 


Next chapter is on /nternet and Internet Resources for 
Digital Libraries. What is Internet, its evolution and advantages 
& disadvantages and how it works? are discussed and 
explained in this chapter. The concept of world wide web 
(www) and art of surfing the net is also described. Some of 
the important Internet resources in Arts & Humanities, Social 
Sciences and Science & Technology are listed and discussed 
in the chapter. The organization of the digital resources is a 
big challenges and scientists are working on them — how the 
resources, particularly the online resources can be presented 
and retrieved for relevant information. Here we are to rely 
upon the concept of cataloguing, where the cataloguing of 
converted print collection or born digital resources is not the 
problem, yet the problem is of organizing online resources. 
Metadata come into the picture here, where they play 
important role for providing relevant information to their users 
or the needers. The next chapter on Cata/oguing Digital 
Resources: Metadata and its Creation describes this concept 
in details. Metadata and its types, their creation and tools for 
creating metadata are discussed in this chapter. 


Preface 


(xvii) . 

There are seen two types of documents in Digital 
Libraries, first ones are those which are present in print form 
and after scanning or digitizing them, they are added to the 
collection and second ones are those which are born digital. 
The problem is not with the born digital rather the problern is 
with the print formats, which need the preservation. The 
principals of Digita/ Preservation, its various approaches and 
the software and hardware needed for digitization the objects, 
and the process of digitization and digital preservation are 
discussed in details in the next chapter. Ultimately, how they 
can be put online on the web, this is also described in brief in 
this chapter. Open access and /nstitutional Repositories are 
emerging as new concept in Digital Libraries. Open access 
journals, electronic database of thesis and institutional 
repositories along with self archiving are described and 
discussed in the next chapter “Open Access and Institutional 
Repositories”. 


Once the contents are developed for Digital Libraries, 
access is given to the needers of them. The next chapter is 
on /nformation Access in Digital Libraries, where its various 
modes and application are discussed and described. 
Copyright and Intellectual Property Rights or \PRs are the 
most concerned issues in libraries since their starting which 
need more attention in digital world. Next chapter discusses 
some of the important aspects of IPRs in Digital Libraries. 
The ultimate goal of the libraries is to serve its users with 
relevant and required information and resources, which is 
much focused in case of Digital Libraries. What role the Digital 
Libraries can play in providing library facilities to distance 
learners form the next chapter. Where, the role of Digital 
Libraries in e-learning is conceptualized. Library consortia, 
the group of similar natured libraries for acquiring e-resources 
in digital world is also getting weightage now-a-days. Next 
chapter pertains to the concept of Library consortia, and two 
important consortia of India - UGC-INFONET and INDEST 


(xvii Manual of Digital Libraries 


are described and discussed in detail along with some other 
consortia like CSIR consortia and Agriculture library consortia. 


Open source is another concept, which is getting 
attraction of Library and Information Scientists in digital library 
world, and open source software are being adopted in various 
libraries world wide for automation and creating Digital 
Libraries. DSpace and Greenstone are two most popular 
digital library software used for creating Digital Libraries, and 
these are described in details with some other software like 
Fedora, etc. in this chapter but the stress is given on DSpace 
and Greenstone. The main emphasis is given on their 
installations and working. The book ends with Epilogue, where 
some challenges of Digital Libraries and their future is 
projected. Besides, the book is appended with two appendices 
on ‘Links and Resources’ and ‘Internet Companies, Library 
Automation Vendors and Information Organizations’ . 


Though, every effort has been made to make the book 
` uptodate with latest development in the field, but the world of 
Digital Libraries is so changing that one can not claim the 
subject is complete in every aspect. Further, the field is so 
interwoven that one component can not be separated with 
the others, so some of the fields or headings may appeared 
in more than one chapter, however they are discussed in 
details at their relevant places. We are thankful of various 
authorities on the subjects whose works have been consulted 
in compiling the present volume, and the same with due 
respect have been included in the comprehensive 
Bibliography given in the last of the book. We are also thankful 
to Shri Sumit Sethi, the owner of Ess Ess Publications, New 
Delhi, who was kind enough with us in presenting this book 
in time and nice form like our earlier books. Dr. Dhiman has 
lost his father on the very first day of 2010, but his blessing 
will be remained with us. This book is a dedication to the 
‘sacred memory’ of him. Besides, Dr. Dhiman is also indebted 


‘Preface ( xix} 


to his mother Smt. Kamla Devi for her blessing. Smt 
Yashoda Rani, is also indebted to her parents Shri Chandre 
Dutt Sharma and Smt. Kamla Devi for their mora 
encouragement in completion of the work. Lastly but not the 
least, we both, Dr. Dhiman and Yashoda Rani, are thankfu 
of our son, Master Aman Kumar Jagdev, who suffered a lo 
during our working on the manuscript of the book, because 
he needs the attention of both of us more at his developing 


stage. 


It is hoped this book will serve its purpose and will fin: 
its place among the teaching community, students of librar 
& information science and the working community comprisin: 
of library and information scientists. 


Dr. Anil Kumar Dhima 
M.A.,M.Sc., M.Lib.Sc., B.Ed.,PGDC 
Ph.D (Botany) & Ph.D. (Library Scien 
Information Scien. 

Gurukul Kangri Univeris 


Haridwar - 249 404 (Ind 


ar 


Smt. Yashoda Ra 
M.A., M.Lib S 

M.Phil (Library Scient 
197, Maktool P 


15 November 2011 Roorkee - 247 667 (Ind 


1 
Prologue 


Information and Communication Technology (ICT) has 
spread over all the fields like Sun rays which reach all nuke 
and corner of the world . Today Library is not an exception to 
this. Library was a source of information for people from all 
walks of life. The role of librarians was that of knowledge 
manager, who acquired and collected, stored, catalogued and 
retrieved the printed materials as and when required by public/ 
users, according of their taste. But this traditional role has 
seen a sea change in the last decade or so with the growth 
and advent of communication and technology. The librarians 
who were more of knowledge manager have moved from that 
role to become more of service providers. When Digital 
revolution started, there was a notion that Librarians would 
no longer persist. But are they? The answer is an emphatic 
no! They have come back with a bang by embracing 
technology and understanding the potentials of the same. 
Most of the libraries have started using the Information, 
Communication and Technology (ICT) to serve their 
communities in more effective manner. The community at 
large has started depending and also demanding libraries to 
provide better services to them. 


But if we talk about digital library, the perception of the 
term” digital library” are a bit like perception of the elephant 
in the poem by John Godfrey Saxe, “The Six Blind Men and 
the Elephant”. Based on an old Indian fable, the storyteller 


2 Manual of Digital Libraries 


presents an array of descriptions from each of six blind men 
when in the presence of an actual elephant. If we were to ask 
several people today to define digital library, we would 
undoubtedly get several different definitions. The digital library 
has been variously called the electronic library, the virtual 
library, the library without walls, the cybrary, the digital spatial 
library, digitized collection, and the library of the future. 


While these are not necessarily synonyms for digital 
library, they do have some aspects in common. For example, 
the notion of a virtual library, a term that preceded the world 
digital library, is often used to describe the extension of 
existing library services into the digital realm. For example, 
an online catalogue was considered to be a virtual library by 
some library professionals. The term electronic library became 
popular in the United Kingdom to represent the same concept- 
that of modernizing existing library services to include digital 
resources. Others used some of the various terms mentioned 
above to describe the “elephant”. 


The Libraries whether they are traditional one or so 
called digital ones, they serve at least three roles in learning. 
They serve practical role in sharing expensive sources-both 
physical and human resources. Libraries serve a cultural role 
in preserving and organizing artifacts and ideas. Third, 
libraries serve social and intellectual role by bringing together 
people and ideas. Libraries serve as centres of 
interdisciplinary places shared by learners from all disciplines. 
Digital libraries extend such interdisciplinary approach by 
making diverse information resources available beyond the 
physical space shared by a group of learners. 


The information resources- both physical and human- 
are Customized for specific missions that have traditionally 
separated, although common technologies, such as printing, 
photography, and computing, are found across all settings, 
breaking down all physical barriers between resources. Digital 


Prologue 3 


libraries have obvious roles to play in formal learning by 
providing teachers and learners with knowledge bases in a 
variety of media. One clear difference between traditional 
libraries and digital libraries is that digital libraries offer greater 
opportunity for users for deposit as well as to use information. 


1.1. ELECTRONIC LIBRARIES 


The term ‘Electronic Library’ means many things to 
many people and are interpreted in many ways. It is a “Library 
without walls, networked library, desktop library, logical library, 
virtual library, information nerve center, information 
management center and digital library”. Oppenheim describes 
it, as an “organized and managed collection of information in 
a variety of media text, still image, moving image sound or 
combinations thereof, but all in digital form”. The collection is 
organized and managed for the benefit of an actual and 
potential user population and to have an easy access to its 
contents. Such as electronic library will include a number of 
search or navigation aids that will both operate with-in that 
particular library and will allow access to other collections of 
information connected by network worldwide. It is a “common 
vision of the researchers, librarians, publishers, scientists, 
technology experts and all other kinds of information seekers 
any-where at any time”. The concept of universality of access 
not only shared by many writers, but it also raises the issue 
of the distinction between an electronic library on a virtual 
library. The “real electronic library is not a library at all but a 
data warehouse”. 


Traditional libraries are the storehouse of materials and 
knowledge mainly in the form of books and other printed 
materials. As the volumes of information grow and the 
traditional means of transmitting changes, librarians are 
expanding their traditional role to survive and even flourish in 
an electronic age. The electronic libraries are also the result 


4 Manual of Digital Libraries 


of the same. The main causes for the transition from the paper 
communication to electronic communication have been 
identified as follows: 


e Increasing cost of books and journals manufacturing. 
e Decreasing the time required for publishing. 


e Controlling the ever increasing amount of data and 
information. 


e Recognition of some unique and special attributes of 
electronic media, i.e., interaction with the user and the 
information system. 


The eminent Librarian Prof. Lancaster wrote, “the future 
of libraries will be bypassed by the modern Electronic 
Libraries. All the technological developments will soon make 
paper less libraries in the world by 2000 AD”. He also said 
that whether we accept it or not, print or paper will give way 
to electronics soon. He also believed that databases available 
only in machine-readable form as electronic reference books. 
According to him, by 2001 onwards only a few high-circulation 
journals alone will be published in printed form. Most of the 
journals will be issued on-line, as well as on tape cassettes, 
video discs and in many other electronic forms. 


Now, we are many on-line journals are already available 
in EIES (Electronic Information Exchange System). Several 
European publishers have already developed system to 
deliver full text journal articles on demand, which is known as 
Article Delivery On-line Information System (ADONIS). Other 
new technologies already in use include, video and optical 
digital disks, video text systems and digital telefacesimile 
equipments. All such innovations permit the user to identify 
and located information with out entering in a library. Every 
information is electronically available to the consumer’s desk 
on demand. 


Prologue 5 


1.2. FUNCTIONS AND ROLES IN THE ELECTRONIC 
LIBRARY 


Community -based controls are emerging of their own 
accord, and the extent to which they are proving effective. 
Early attempts at external controls are assessed, and the role 
of librarians in enforcement examined. The term ‘electronic 
library’ is intentionally used in the singular; or, more precisely, 
in the generic, because the term does not admit of a plural. 
What were once relatively independent community, institution 
and corporate libraries are now just physical branches of one 
library. 


Table 1.1 : Functions and Roles in the Electronic Library 


Electronic Library Function Electronic Librarian Role 
Acquisition Selector 

Cataloguing and Classification Organizer 

Repository Custodian 

Access and Discovery Helper 

Protection Censor 

Facilities Provision Community Resource 
Value-Added Services Services Provider 


Libraries store books and other works. Cheap magnetic, 
optical and electronic storage and communications means 
that storage of materials of all kinds is being undertaken by 
many institutions and many individuals. So, in the digital era, 
the librarian’s acquisition and repository role will be of greatly 
decreased importance. Access and discovery aspects include 
the environment for user-performed access and discovery 
(the analogue to open-access card-catalogues, and walking 
to the shelves yourself) and professional — assisted access 
and discovery (the analogue to reader services, inter-library 
loan, and commissioned search-and-delivery). 


6 Manual of Digital Libraries 


These roles are likely to become even more important 
in the future, but recent developments on the world-wide web 
have made clear that they are capable of considerable 
automation, provided there is a fallback ‘reader services’ 
function. The protection function cuts across several of the 
others, because it involves selection for entry into collections, 
classification, and filtering of materials provided in response 
to requests. The recent spate of concerns about ‘pornography 
in kiddies bedrooms’ has resulted in a number of protective 
schemes to provide safe electronic areas in which parents 
and teachers can let children loose. 


The facilities necessary to access the electronic library 
include workstations, network connection, downstream 
bandwidth, printers and copiers. They extend beyond 
hardware to include messaging and mailboxes, bulletin- 
boards, and chat and conferencing tools. 


The public will not accept the role of force-fed-consumer 
all of the time. This implies that the information infrastructure 
must have ‘relative bandwith symmetry’, so that it does not 
degenerate into another one-way broadcast medium. To 
enable people to publish as well as be published at, the 
facilities available in the local physical branch of the electronic 
library must extend to upstream bandwidth, scanners and 
graphic design tools. At the portal of the 21* century, libraries 
are community centers supporting tele-working and other 
forms of participation, not just information extraction and book- 
based entertainment. 


There are various forms of dysfunctional behaviour 
which are of varying relevance to the electronic library: 


Accidental Dysfunctionally : Among these risks, the 
primary ones that need to be addressed appear to be 
information overload, rumour and misinformation, and 
negligent defamation. Information overload can be addressed 
through the provision of pre-set ‘filters’ and ‘views’ of 


Prologue : 7 


databases: of a range of search tools to suit the varying 
capacities of the services’ users; of educational and training 
materials; of accessible reference documentation; and of 
personal support. Rumour, misinformation and defamation 
have hither to been kept under some degree of control through 
process. The impending disappearance of traditional 
publishing houses removes that control. 


There is an apparent commercial need for materials to 
be checked for defamatory statement, because of the 
possibility that US-style litigiousness may extend to other 
countries,e.g. through that country claiming jurisdiction over 
servers outside its territory but reaching into via the Internet. 
These kinds of risks requires the filtering of materials going 
into database, and hence overlaps with the protection and 
value-added service functions. But perhaps the most 
significant form of accident arises from search engines, which 
spend their days and our nights trawling the publicly 
accessible areas of the net and building concordances. Every 
petty exchange in a semi-public/ semi-closed archived 
emailing list will be capable of being re-surfaced, consolidated, 
reconciled and re-analysed. 


In future, aspiring politicians will find their collected 
adolescent inanities played back to them during their election 
campaigns. More problematically, everyone who takes a 
public position on any issue is in continual danger of 
information- based character assassination. 


Aggressive Dysfunctionality : There is some degree 
of justice in using a persons’s own words against themselves. 
In the past, libraries and librarians have seldom been taken 
to task for carrying ‘Mein Kampf’. But now, the electronic 
library extends to the provision of communications services. 


Avoidance Dysfunctionality : Avoidance involves 
disguising one’s tracks. Anonymous and pseudonymous 
documents are entirely acceptable to everyone if their 


8 Manual of Digital Libraries 


authorship is lost in the mists of time, and genuine 
whistleblowers are widely regarded as deserving of protection. 
But difficulties arise with contemporary political and even 
cultural treatises. 


These kinds of chewing gum that can arise in the 
electronic community necessarily involved examples of 
control mechanisms. The atmosphere and ethos of electronic 
communities may not replicate precisely the same kinds of 
restraints that physical communities contain, but some convert 
reasonably well, and many new forms have emerged. From 
the viewpoint of libraries, the community no longer comprises 
people who see one another, and hence the morality is likely 
to be attenuated. Indeed, electronic library users donot even 
see much of one another's data-trails, as they do in emailing 
lists and newsgroup’s, where ‘episodes’, and ‘threads’ give 
rise to a kind of cultural integrity. 


1.3. VIRTUAL LIBRARIES 


If we talk about ‘Virtual Library’, it is a library in which 
the holdings are found in electronic stacks. It is a library that 
exists, without any regard to a physical space or location. It 
is a technological way to bring together the resources of 
various libraries and information services, both internal and 
external, all in one place, so users can find what they need 
quickly and easily. Sounds great, right? Well, the virtual library 
also has its drawbacks and limitations. Michael Schuyler 
likens the virtual library to a popsicle, stating that “...if the 
electricity goes off, the cold goes away-and so does the 
popsicle, leaving a soggy smear on the shelf where something 
substantial once resided. The virtual library suffers the same 
vulnerability and the same precarious existence”. However, 
when they work, virtual libraries can be very useful and very 
diverse in what they contain. The option for what they can 
include are virtually endless, and become more and more 


Prologue 9 


boundless as technology advances. Some of the content of 
a virtual library may include, but certainly is not limited to, 
CD-ROM, Internet subscriptions, lists of annotated web links, 
internal work products such as — brief banks; proprietary data- 
bases such as — LexisNexis or Westlaw and even web spiders 
or push technology that deliver targeted to the user. 


There are many advantages to going virtual. Some of 
the advantages include the following: 


° It saves and/or reduces the physical space taken up by 
library materials. 


° It often adds enhanced searching capabilities in a digital 
format. 


e The library materials are available at the user’s desktop, 
regardless of where the user is physical located. 


° It allows for the inclusion of materials only available on 
the internet or in digital ‘format’. 


e It provides the user with the capability to download and 
manipulate text. 


° It often allows for multiple, concurrent users. 


° It eliminates the problem of a book being missing or off 
the shelf. 


° Itis less labour intensive. 


The last advantage is sometimes not true. Althouth a 
virtual library does not require as much time from the library 
filters and shelves, it takes a lot more time from a librarian, 
and/or possibly someone in the IT department, to learn how 
to install, maintain and use the product.. 


Together with advantages, like all things, there are 
some disadvantages of virtual libraries. Major disadvantages 
may include the followings: 


10 Manual of Digital Libraries 
e Every product has its own distinct user interface. 


e Users need to remember different passwords for 
different products. 


e The scope of coverage and available archives is often 
limited. 


e There are often difficulties with downloading or printing. 


e Often there is no cost savings, especially when both 
the virtual and print products are maintained. 


e Everything is not available is digital format. 


e There are restrictions, which vary from vendor to 
vendor, or how the product can be used. 


e The virtual library relies on power and computer 
networks in order to be available for use. 


e Users cannot spread everything out in front of them 
and use it all at once. 


e Users are most comfortable using books. 


The last point is a very interesting one. Users have a 
comfort level with the books. However, what still holds true is 
that, with a book, or set of books, there is a quantifiable 
beginning and end, which is not as clear with digital products 
and gets even more blurred when users move out into the 
internet. This is the place where librarians are most useful, 
assisting users at figuring out when to stop and how to 
separate the good from the bad. 


Ultimately Digital Library is not an alienated phrase for 
the librarians. Librarians have been involved in involved in 
organizing electronic libraries, virtual libraries, and net libraries 
and so no for years now. And, Digital Library is another 
extended form of these libraries. 


Earlier, the terms such as Electronic Library, Digital 


Prologue 11 


Library and Virtual Library, were defined separately in isolated 
compartments, today, the technological advancements have 
made all three concepts more or less the same. Americans 
have popularized the term Digital Library to denote all the 
three concepts and these terms are use interchangeably. 


1.4. FOUNDATIONS AND EVOLUTION OF DIGITAL 
LIBRARY 


There was a sudden shift from over-used, delicate, 
voluminous, manuscripts to the handy, mass-product, 
compact, printed books, when the printing press was invented 
in the 15" century. Till then the role librarian played was more 
of storekeeper than the supplier of accumulated knowledge. 
The writing material was papyrus, scrolls of a sheep leather, 
palm leaves, potsherds, wax plates and clay tablets. These 
occupied naturally lots of space and the Librarian had to find 
ways and means primarily to safeguard the precious 
manuscripts from every danger, for posterity. The dangers 
could be from fire, moisture, robbers, adverse climate, rats, 
moths, silver fish and other creatures. This includes 
continuous expenses of time and money for the repair of the 
brittle text many a times the over use of the manuscripts had 
damaged the writings and made it illegible. Hence, the primary 
purpose of the library was more a museum and librarian was 
a curator. The usage of the original text was the privilege of 
only a few. At best the librarian spent his time and resources 
for copying the older texts. 


However, the printed book changed this role completely. 
There was no worry that the text will be extinct. He could buy 
more copies at cheaper price compared to acquiring the 
manuscripts. It revolutionized the very concept of a library at 
that time. The focus was then on for choice selection from 
wide offers; employing a system of classification as widely 
accepted as possible for storing; safe storing of these books 


12 Manual of Digital Libraries 


from all kinds of destroyers or robbers; and then evolve for 
efficient ways to retrieve the book from the storage facility at 
short notice. Librarian becomes the facilitator of knowledge 
by making the information within everyone's reach. 


In the same way after about 500 years, today with IT 
technology coming to maturity, its effect on the library science 
and the services the libraries provide are causing a radical 
shift. There is a close parallel between our times and that of 
16" century revolution. The world of yesterday was divided 
into literate or illiterate; but today the world is divided between 
the IT have and IT have-nots. The technology of information 
storing and transmission has come to stay as a culture. 


But we find a few classic thought pieces by some 
exceptional thinkers, when we peep into earlier part of last 
century when the world was primarily analog. H.G. Wells wrote 
a piece that proposed the idea of what some came to call a 
“world brain”. Wells actually had this notion of a world-mind 
prior to publishing his monograph, but had not yet recorded 
the idea. 


Fremont Rider in his work entitled “The Scholar and 
the Future of the Research Library” in 1914 set a landmark. 
Rider was an author, publisher, inventor, and at that time 
Librarian of Wesleyan University. By analyzing historical 
growth statistics, he was able to demonstrate that research 
libraries tended to grow at an exponential rate, causing them 
to double in size every 16 years on the average. He drove 
that message home by calculating that at the rate of growth 
by the Yale Library would contain 200,000,000 volumes by 
the year 2040 and that its catalogue would occupy eight acres 
of floor space. Not only did Rider define the central problem 
of research libraries-exponential growth-but he also had a 
technical solution to offer. He visualized a research library of 
the future that would consist entirely of microcards, which he 
had just invented. Rider’s microcards would have the 


Prologue 13 


catalogue entry on one side and the text of the book on the 
other. It was an ingenious idea, but it proved to be impractical. 
Since then, other kind of microformars have found a useful 
place in libraries; they mitigated but did not solve, the growth 
problem. Devising solutions to the problems of the growth of 
libraries and information has been a prime concern of 
librarians, scientists, engineers, inventors and entrepreneurs 
ever since Rider called attention to it. 


Just after World Ward II, Dr. Vannevar Bush, scientific 
advisor to both Presidents Roosevelt and Truman, published 
a classic thought piece titled “ As We May Think”, in which he 
postulated the notion of the Memex — short for “Memory 
Extender’, which would hold the entire collection of works 
pertinent to a scientist in pursuit of his research in a single 
desktop service. This was a remarkable suggestion, since 
the digital computer had just been built! Bush himself was an 
analog thinker, and his suggestion for a desktop memory 
extender was based on an analog technology Involving 
microforms. Rider and Bush dramatized the library and 
information problem in the postwar period and set the stage 
for the technical developments followed. Ralph Shaw, 
distinghished Librarian and Dean, used Bush's concept of 
the Memex to develop an information-retrieval device the 
1950s called the Rapid Selector. It was a machine that tried 
to combine electronic search and selection from a large store 
a research material on reels of high-reduction film. It too, was 
ahead of its time. 


Another, less-mentioned but well-known, contributor 
was J.C.R. Licklider, an MIT professor who was 
commissioned in the early 1960s to write a thoughtful 
summary of the impact digital technologies would have on 
the future library. In this monograph Licklider, an Engineer, 
advocated incorporating digital technologies into library work, 
and postulated what might be loosely interpreted as the first 


14 Manual of Digital Libraries 


Digital Library. He described plans for developing what he 
called “precognitive systems”, the over-all aim of which was 
“to get the user the fund of knowledge into something more 
nearly like an executive’s or commander's position”. Licklider 
goes on to mention the idea of a desktop console, and what 
he considered to be the most vital part, “the 
telecommunication-telecomputation system and the cable” 
that connects the console “into the precognitive utility net”. 


Many credit Licklider, who conducted his study between 
November 1961 and November 1963, as being the first to 
fully recognize the possibility of creating what we not refer to 
as a digital library. Interestingly, the hook is dedicated to Dr. 
Bush, although at the time of its writing, Licklider had not read 
that 1945 article. Still, prior to that, the words digital library 
had not been recorded as being adjacent to one another in 
print or, as far as we know, in recorded speech. 


Bush laid the ground work with his suggestion that some 
desktop device could store all the information the end user 
would like to know, and that associative trails would relate 
this information in a fashion that would make it retrievable 
from that desktop. This metaphor was advanced and given 
more appealing name in the 1960s — hypertext. In 1965, Ted 
Nelson renamed Memex as Hypertext, defined it as “non- 
sequential writing” and explained that the structures of ideas 
are not sequential. Nelson states, “They tie together every 
which-way. And when we write, we are always trying to tie 
things together in non-sequential ways”. Hypertext became 
reality in the 1980s with the introduction of large amounts of 
memory and storage in the development of personal desktop, 
computing. 


But the boom ended in the 1970s, and escalating 
inflation, declining support for libraries, and declining student 
enrollment set the stage for a new depression in higher 
education in the 1980s and with it an absolute necessity for 


| 


Prologue 15 


research libraries to develop new and more effective ways of 
fulfilling their mission. However, another and even more 
significant trend of the 1970s provided the means for libraries 
to face that future with optimism. In the 1970s, we witnessed 
an almost explosive development of new computer, 
communications, and micrographic technologies. As the 
growth accelerated in the 1980s, it provided at affordable costs 
the advanced electronic technologies needed to implement 
successfully the several approaches for controlling growth 
and sharing resources that were marginal or unsuccessful in 
the past. 


In 1980, Tim Bemers-Lee began to code the pieces of 
an application that, when coupled with the Internet 10 years 
later, would change the way individuals perceived access to 
digital information. That initial work on the Enquire Project 
would lead to what some refer to as the Internet’s killer 
application — the World Wide Web. What distinguished this 
client-server applications software was that it not only 
displayed text and graphics using graphical software that 
clients found on server-side data archives marked up in 
HTML, but it incorporated a feature that both Bush and Nelson 
had proposed — the capability to “hypertext” to another turn 
end users into browsers instead of searchers. As browsers, 
end users found the nature of computing had changed by 
making large amounts of information accessible in convenient 
and flexible fashion. These types of projects provided the 
ground work for the highly interactive and highly 
Interconnected systems that we see today, most of which 
are based on using the Internet as a connection backone. It 
is this ground work that laid the path for digital libraries to 
follow. 


If we go into the depth of evolution of digital library, itis 
seen, Vannevar Bush was one of the first to clearly describe 
problems related to modern explosion of information and to 
appeal to technology to help as meet our needs regarding 


16 Manual of Digital Libraries 


scholarly communication. Twenty years later, Licklider painted 
a more complete picture, identifying the needs for better 
distributed-processing, human-computer interaction, 
document management, and retrieval. Vannevar Bush may 
have created the metaphor that sparked the imagination of 
those who followed his ideas, but few had actually developed 
a prototype Digital Library system. Project INTREX at MIT 
(1965 — 1973) was one of the first storage and retrieval 
systems designed to experiment with online, interactive, 
computer-assisted retrieval of library-type Information. This 
experimental project involved a hybrid combination of digital 
computing and analog microforms. The project conducted a 
series of information transfer experiments, hence named 
INTREX, involving approximately 20,000 scientific and 
technical articles, all stored on microfiche. Retrieval was 
supported by an online catalog and an index, along with 
abstracts. 


In the late 1960s, Mead Data developed the Ohio Bar 
Automated Research (OBAR). This was the precursor to the 
now-familiar and popular full-text LEXIS search service, a 
legal database available online. OBAR provided online access 
to full-text legal statutes. It was one of the first online full-text 
databases, a precursor of things to come. In 1971, Michael 
Hart began Project Gutenburg (http://www.gutenberg.net/) 
with the goal of producing full-text versions of classic 
monographs that had been cleared by copyright compliance. 
Hit original goal was to produce 10,000 such artifacts by 2000. 
One of the Project’s goals was to distribute one trillion e-text 
files by December 31,2001. Project Gutenberg e-text is a 
public domain work distributed through the Project Gutenberg 
Association. Among other things, this means that no one owns 
a United States copyright on this work, so the Project can 
copy and distribute it in the United States without permission 
and without paying royalties. Special rules apply if you wish 
to copy and distribute this e-text under the project’s trademark. 


Prologue 17 


Project Gutenberg presently contributes approximately one 
e-text each day of production. 


In 1982, the Library of Congress announced the Optical 
Disk Pilot Project, an electronic digital imaging system 
containing images of books, journals, and other research 
materials held by the library. In this program, a variety of visual 
media had been recorded on analog laser videodisks to test 
the ability of this technology to help preserve pictorial materials 
and to improve researchers, access to pictorial collections. 
This linked the still images selected from the LC Prints and 
Photographs Division to a microcomputer database for 
retrieval purposes. The project focused on preservation and 
access, but also examined collections management and 
security issues. 


Sophisticated information storage and retrieval systems 
were built during 1980s using state-of-the-art technology of 
distributed database management system linking different 
remote systems. These online information retrieval services 
used data files generated in the process of electronic 
phototypesetting of printed abstracting and indexing services 
and other primary journals. As such, online hosts like DIALOG 
and STN were not only offering online databases, but also 
full text online journals for past several years, although as a 
simple ASCII or text files without graphics and pictures. In 
1989, there were almost 1700 full text sources available 
through sixteen online systems. Availability of CD-ROM in 
late 1980s as a media with high storage capacity, longitivity, 
and ease of transportation triggered production of several 
CD-ROM information products which were earlier available 
through online vendors or as conventional abstracting and 
indexing services in printed format. Moreover, several full text 
databases also started appearing in late 1980s and early 
1990s leading to beginning of digital era. Some of the 
important full text digital collections available on CD-ROM 
include: ADONIS, /EE/EE Electronic Library (IEL), ABI/NEO, 


18 Manual of Digital Libraries 


UMs, International Business Database, UMI’s General 
Reference Periodicals, Espace World, US Patent etc. 


; Digital document imaging system, which employs 
computer hardware and software to scan and store images 
of documents in digitized formats, evolved in early 1980s to 
overcome the limitation of text storage and retrieval systems 
which could only store textual information. The earliest 
application of a document imaging system was the Optical 
Disk Pilot Project at the library of Congress. Several document 
imaging software packages are currently available in the 
market including OmniDoc (Newgen) and Datascan (Stacks 
India) —two important document imaging software from India. 


The beginning of full text digital library involved building- 
up several client systems usable in a multitude of 
environments, such as MS Windows, MS DOS, Apple 
Macintosh and a diversity of UNIX systems as well as for 
terminal-oriented mainframe systems, notably VT-100 and 
VT-220. Upscaling of digital library in those days entailed huge 
maintenance problems because all clients systems had to 
be upgraded and scaled for new facilities and emerging new 
techniques and processes. In 1989, the American Memory 
Project began with a survey of ARL membership. To help to 
launch the project, a consultant surveyed 101 members of 
the ARL and the 51 state library agencies. The survey 
disclosed a genuine appetite for on-line collections, especially 
in university research libraries. The project (1990-1995) 
identified multiple audiences for digital collections in a special 
survey; an end user evaluation; and thousands of 
conversations, letters, and encounters with visitors. 


The most through audience appraisal carried out by 
the LC consisted of an end user evaluation conducted in 1992- 
1993. Forty-four school, college and university, and state and 
public libraries were provided with dozen American Memory 
collections on CD-ROMs and video disks. Participating library 


Prologue 19 


staff, teachers, students, and the public were polled about 
which digitized materials they had used and how well the 
delivery systems worked. The evaluation indicated continued 
interest by university libraries, as well as public libraries. The 
most surprising finding, however, was the strong enthusiasm 
in schools, especially at the secondary level. These initial 
explorations proved that digitizing image and various media 
was, in fact, feasible, but that a continued research initiate 
needed to be implemented to create a solid base for 
production purposes. 


The 1990s brought in a true revolution in digital library 
system. The advent of WWW offered a crucial advantage 
with the availability of ready-to-use, publicly available, user- 
friendly graphical web browser for all prevalent platforms. . 
Standard WWW clients such as Netscape Navigator and 
Internet Explorer are being upgraded regularly for added 
functionality such as e-mail client support for JAVA, Active X 
and the ability to view important document formats without 
having to install plug-ins for them. These browsers solved 
the maintenance problem allowing developers to concentrate 
fully on the server side and not to bother with the client side. 
These browsers are available freely and are easy to use 
eliminating the need of extensive support and user's training. 
The internet and associated technologies, made it possible 
for digital libraries to include multimedia objects such as text, 
image, audio and video. These internet and web technologies 
thus brought-in the graphical components to the digital library 
which was missing in earlier digital library implementation. 


In 1991, a workshop funded by the National Science 
Foundation (NSF) and co-sponsored by the University of 
Nebraska Center for Communication and Information 
Sciences brought together leading scientists involved in 
analyzing and retrieving textual Information stored in digital 
formats. This invitational workshop, titled “Future Directions 


20 Manual of Digital Libraries 


in Text Analysis, Retrieval and Understanding”, resulted in a 
report on current research in the area of text retrieval and 
white paper calling for the establishment of a “National 
Electronic Library”, the ideas for which were originated during 
the workshop. The report was published in two volumes, one 
summarizing the workshop and some follow-up activities, and 
the other a collection of position papers presented at the 
workshop. In the first volume, M. Lesk, E. Fox, and M. McGill 
called for the funding of a “National Electronic Science, 
Engineering, and Technology Library”. This “ White Paper 
on Digital Libraries” challenged U.S. funding agencies with a 
call for “International Information Competitiveness” and for 
and improvement of the U.S. educational system. Its 
recommendation was that “NSF should solicit proposals from 
groups of researchers to create on-line libraries in key 
scientific areas...” In accordance with a set of guidelines 
pointed towards developing “reasonable coverage of basic 
science and engineering”, to be made available using 
distributed networked technologies. The report specially 
identified NSF, Defense Advanced Research Projects Agency 
(DARRA), and NASA as potential collaborative funding 
sources, and called for as much as $50 million in funding for 
research and development. It was this white paper that lead 
directly to the call for research into digital libraries first made 
in 1993. 


In 1994, the Digital Library Phase One Initiative was 
launched, providing up to $24 million to support six large- 
scale research projects over a four-year period. This multi- 
agency research initiative was jointly sponsored by the NSF, 
DARPA, and ‘NASA. It demonstrates the U.S. government's 
efforts competitive on the global information infrastructure and 
remain competitive on the global information environment. 
The shared vision is best illustrated in the mission statement 
of Digital Library Initiative as : 


Prologue ay’ 21 


The Initiative’s focus is to dramatically advance the 


means to collect, store, and organize information in digital 
forms and make it available for searching, retrieval, and 
processing via communication networks. 


The six research institutions receiving support from this 


initial Digital Library Initiative were: 


1. 


The University of Michigan Digital Library Research 
Project: The core of the Digital Library at the University 
of Michigan has been the “agent architecture that 
supports the teaming of agents to provide complex 
services by combining limited individual capabilities”. 
The content focuses on earth and space sciences. 


The University of Illinois at Urbana-Champaign Digital 
Library Research Project: This research effort 
concentrated on building an experimental test bed with 
tens of thousands of full-text journal articles from 
physics, engineering, and computer science. 


The University of California at Berkeley Digital Library 
Research Project: The Project’s goal is to develop the 
technologies for intelligent access to massive, 
distributed collections of photographs, satellite images, 
maps, full-text documents, and “multivalent” documents 
on environmental information. 


Carnegie Mellon University Digital Library Research 
Project: The Infomedia Digital Video Library at Carnegie 
Mellon University studies how multimedia digital 
libraries with digital video, audio, images, and text 
information can be established and used. 


The Stanford University Digital Libraries Project: This 
project focuses on interoperability and aims to develop 
a single, integrated vi j that will provide 


22 Manual of Digital Libraries 


6. The Alexandria Project at the University of California at 
Santa Barbara: This project focuses on geographically 
referenced information and aims to provide easy access 
to collections of maps, images, pictorial material and 
other spatially referenced information. 


Phase | of the Digital Libraries Initiative ran from 1994 
until 1998. New funding was secured from some of the original 
agencies, and several new agencies joined in to form the 
1998 Digital Libraries-Phase II Initiative. This funding period 
runs through 2004 for several of the project identified within 
this initiative. 

In 1994, the Library of Congress’s National Digital 
Library Program (NDLP), supported by Ameritech, addressed 
the details of electronic document imaging, text storage, and 
retrieval of selected print and non-print materials held by LC. 
NDLP was focused on assembling a digital library of primary 
source materials reproductions to support the study of U.S. 
history and culture. Begun in 1995, after a five-year pilot 
project, the program began digitizing selected collections of 
LC archival materials that chronicle the nation’s rich culture 
heritage. 


To reproduce collections of books, pamphlets, motion 
pictures, manuscripts, and sound recordings, the library has 
created a wide array of digital entities— bitonal document 
images, grayscale and colour pictorial images, digital video 
and audio, and searchable texts. To provide access to the 
reproductions, the project developed a range of descriptive 
elements— bibliographic records, finding aids and introductory 
texts and programs, as well as indexing the full texts for certain 
types of content. The reproductions were produced using a 
variety of tools— scanners, digital cameras, devices that 
digitize audio and video, and human labor for re-keying and 
encoding texts. 


American Memory employs national-standard and well 


Prologue 23 


established industry-standard formats for many digital 
reproductions (e.g., texts encoded with SGML and images 
stored in Tagged Image File Format (TIFF) files or 
compressed with the Joint Photographic Experts Group 
(JPEG algorighm). In other cases, the lack of well-established 
standards has led to the use of emerging formats (e.g., Real 
Audio [for audio], Quicktime [for moving images], and Mr Sid 
[for maps]). The Library of Congress leads the NDLP with 
financial support from the U.S. Congress, which provided 
$3million per year from 1995 for five years, provided each 
dollar was matched with three dollars from other sources. 
The LC has successfully raised an additional $45 million to 
create a $60 million five-year program. 


Digital Libraries Initiative Phase II, is a multi-agency 
initiative that seeks to provide leadership in research 
fundamental to developing the next generation of digital 
libraries; to advance the use and usability of globally 
distributed, networked information resources; and to 
encourage existing and new communities to focus on 
innovative applications areas. It seeks to address the digital 
library’s life cycle from information creation, access, and use, 
to archiving and preservation. Research to gain a better 
understanding of the long-term social, behavioural, and 
economic implications of and the effects of new digital library 
capabilities in such areas of human activity as research, 
education, commerce, defense, health services and 
recreation is an important part of this initiative. 


Besides research support from multiple agencies, 
specific projects have been identified for funding by the LC 
and various private foundations. Many of these are oriented 
towards building actual digital libraries, but research plays 
an active role in each. There has thus been a steady move 
up the technological scale for the digital libraries from previous 
(late 1980s) low-end electronic publications available as ASCII 
files, to being organized and searchable on gophers (1992) 


24 Manual of Digital Libraries 


and to being tagged and graphically viewable on WWW sites 
(1994). Recent growth and development in digital libraries 
can be attributed to availability of the internet and web 
technology as a media of information presentation and 
delivery and convenience it offers. 


Panel 1.1 : Two Pioneers of Digital Libraries 


The vision of the digital library is not new but this is a field in 
which progress has been achieved by the incremental efforts of 
numerous people over a long period of time. However, a few authors 
stand out because their writings have inspired future generations. 
Two of them are Vannevar Bush and J.C.R Licklider. 


In July of 1945, Bush, then director of the U.S. Office of Scientific 
Research and Development, published an article titled “ As We 
May Think’ in the Atlantic Monthly. A copy of “As We May Think” is 
placed on Atlantic monthly’s website. Anyone interested in libraries 
on in scientific information should read it. This article is an elegantly 
written exposition of the potential that technology offers the scientist 
to gather, store, find, and retrieve information. Much of his analysis 
rings as true today as it did 55 years ago. 


Bush commented that “our methods of transmitting and 
reviewing the results of research are generations old and by now 
are totally inadequate for their purpose”. He discussed recent 
technological advances and how they might conceivably be applied 
at some distant time in the future. He provided an outline of one 
possible technical approach, which he called Memex. An interesting 
historical footnote is that the Memex design used photography to 
store information. For many years, microfilm was the technology 
perceived as the most suitable for storing information cheaply. 


Bush is often cited as the first person to articulate the new vision 
of a library, but this is not correct. His article built on earlier work, 
much of it carried out in Germany before World War II. The 
importance of his article lies in its wonderful exposition of the 
relationship between information and scientific research, and in 
the latent potential of technology. 


Licklider was one of several people at the Massachusetts 
Institute of Technology who studied how digital computing could 
transform libraries in the 1960s. Like Bush, Licklider was most 
interested in the literature of science; however, he foresaw many 
developments that have occurred in modern computing. 


Prologue 25 


He has described the research and development needed to 
build a truly usable digital library in his book titled Libraries of the 
Future in 1965. Where he wrote, time-shared computing was still 
in the research laboratory, and computer memory cost a dollar a 
byte, but he made a bold attempt to predict what a digital library 
might be like 30 years later, in 1994. His predictions proved 
remarkable accurate in their overall vision, though naturally he did 
not foretell every change that has happened in 30 years. In general, 
he underestimated how much would be achieved by brute-force 
methods, using huge amounts of cheap computer power, and 
overestimated how much progress could be made from artificial 
intelligence and improvements in computer methods of natural 
language processing. 


Licklider’s book is hard to find and less well known than it should 
be. It is one of the few important documents about digital libraries 


that are not available on the Internet. 


1.5. ATTEMPTS IN INDIA FOR DIGITAL LIBRARIES 


The evolution and foundations of Digital Libraries in 
India can be seen with the establishment and development 
of computerization and library automation activities. 


Computerization had its beginnings in India when punch 
cards were used during the late 1950s and early 1960s. 
Computerization activities were started in India in 1955 with 
the installation of the first computer system HEC-2M at the 
Indian Statistical Institute (ISI), Calcutta. A second computer 
Ural, also installed at ISI in 1958 followed this. In 1960s 
computers arrived in India. Mini computers were started 
penetrating the market in late 1970s while microcomputers 
were introduced in the 1980s in the country. However, due to 
the high cost of mainframe and mini computers, use of punch 
cards continued till 1980s. The shift from punch cards took 
place only after the advent of microcomputers, which were 
generally cheaper then the mainframes, and the minis and 
many institutions could buy them. 


The period between 1955-65 can be called as 


26 Manual of Digital Libraries 


introductory phase during which 16 computers were 
introduced in the country. The second phase of 1965-72 can 
be termed as consolidation phase, and 170 computers were 
installed in India by 1972. IBM had a lion’s share of 60-75 
percent of the computers installed in the country in these 
phases. About 120 third and fourth generation systems were 
imported in to the country during 1976-81. ISI and Jadavpur 
University jointly designed and developed the first indigenous 
computer ISIJU that was installed at Jadavpur University in 
July 1964. However, the commercial production of indigenous 
computers could be started only in1973 when the Electronic 
Corporation of India Limited (ECIL) started their 
manufacturing during 1973-78, and it installed 94 computers 
in the country. The Indian Computer industry has grown ata 
rate of about 10 percent per year during late 1970s. 


Until 1985, India’s contribution to computers and 
telecommunication technologies was practically nil. The main 
reasons were the non-receptive government policies and 
hostility from trade unions. This is why the computer revolution 
of the 1970s which marched ahead with full steam not only in 
the West but also in neighbouring countries like Singapore. 
Taiwan, Hong Kong, South Korea, Thailand and Malaysia, 
did not have a foot hold in India. With the government formed 
in 1984 adopting technology-friendly policies, the situation 
started improving slowly. In 1986 the Hindustan Computers 
Ltd. (HCL), launched a price war in microcomputers and the 
computer market in the country started growing. Many new 
companies rushed to encash to trend. By 1988 there were 
250 manufactures in the field. The total number of computer 
system increased from 120 in 1970 to around 448 and to 600 
in 1980 and this rose to 2000 computers in 1984 which saw a 
five fold rise to reach 10,000 in 1985 and about 1 lakh by the 
end of the Seventh Plan. There was a steady growth in the 
computer industry in the late 1980s. In 1984 India had an 
installed base of 9,100 computers out of which 4,050 were 


Prologue 27 


manufactured in the country. By 1986, this number rose to 
92,150 and 22,150, respectively. Fierce competition brought 
down the microcomputer prices, but only elite institutions 
could buy them. 


Liberalized economic policy of the government and 
application of information technology saw the all round growth 
of the computers industry during the Eight Plan (1992-97). 
During this period the computer industry got impetus from 
the new computer policy announced by the Department of 
Electronics (DOE), Govt of India in November 1994. The 
number of low cost PC-based networks and LANs increased. 
There were an estimated 1,00,000 e-mail users in the country 
in 1995 which was expected to be doubled by the end of Eight 
Plan. The production of computers using Intel’s Pentium 
processor stabilized and many key sectors like Banking, 
Railways, Airlines, Oil, Power, Defence, Coal, S&T, Steel, 
and Education have started to induct computers during this 
period. The Indian computer industry has grown at a rate of 
about 10 percent per year during late 1970s and in 1985 the 
production of computers had reached Rs. 2000 million. The 
Ninth Plan document of DOE expected to achieve a 
production value of Rs. 178.5 billion by the year 2001-02, 
with a cumulative growth of 46 per annum. 


In the recent years, the production of PCs has registered 
a phenomenal growth. The 286s, 386s and the 486s were 
taken out of production line. During 1996-97 a total of 4,67,387 
PCs were produced of which more than 50 percent (2,77,386) 
were Pentiums. In 1999, the country had an installed base of 
about 3.2 million PCs of which about 56 percent (1.8 million) 
were 386 and above. This base expected to reach 4.47 million 
in 2000 and 9.64 million by the year 2002. In 1998, about 
1,65,000 PCs were sold with expected sales of 3,14,000 PCs 
in 1999 and about 5,00,000 in the year 2000. 


In the late 1990s, the computer industry both — 


28 Manual of Digital Libraries 


hardware and software attained maturity. This was the result 
of the new computer policy of 1994, the IT-friendly measures 
of the government including the IT Action Plan, and the 
realization of the necessity of automation in all walks of life. 
During the year 1994-95 the growth was 60 percent over the 
previous year; in the following year (1995-96) it was 50 
percent. In recent past, the small and medium enterprises 
and the home market gained momentum and the falling prices 
helped boost sales. Although the growth in 1997-98 was only 
20 percent due to the slow economic growth and instability, it 
picked up in the subsequent year and registered a growth of 
32.4 percent. 


Although these statistics are encouraging, the PC 
density (number of PC’s per 100 population) in the country is 
too low to be happy. This figure indicates the usage of PCs in 
various activities such as business, marketing, information 
retrieval, electronic messaging, e-mail, file transfer, and shows 
the extent of the penetration of computer culture in the society. 
As per the International Telecommunications Union (ITU) in 
1995 India had only 0.1 computers per 100 people or one 
computer for every 1000 population. This rose to 0.2 in 1998 
and was expected to reach 2 by the year 2008. The developed 
countries have to 10 PCs and above with USA and 
Switzerland leading with PC densities of 29.7 and 28.8, 
respectively. However, the falling prices of the PCs because 
of fierce competition between films to have a niche in the 
market, resulted in reduced value of sales. Unlike the initial 
brands (like PC-XT,286 and 386) the prices of latter brands 
like PC 486 and Pentium fell steeply. In just two years (1994- 
96) the prices of Pentiums fell steeply from about Rs, 2,00,000 
to 45,000 finally setting around Rs. 30,000, the average selling 
value of PC in 1999. Currently a Pentium 4 with 1.5 GHz 
clock speed desktop sells at around at Rs. 30,000 or low. All 
these resulted in increased computer literacy and made the 

computerization a reality in all sections in India. 


Prologue 29 
1.5.1. Development of Library Automation Software in India 


While the software industry is a well organized sector, 
Library Software is largely confined to some premier 
institutions and their libraries. It is only recently that private 
sector entered this field. Even then, the aggressive marketing 
efforts as in the case of computer software are not there. 
Many of the library automation efforts using in-house expertise 
were around the existing PCs and software packages 
including dBase, and FoxPro. In some cases, high level 
programming languages like COBOL, BASIC, C, C++ and 
Pascal were used. Initially, only simple and important 
functions such as acquisitions, circulation and cataloguing 
only were automated; integrated packages were thought at a 
later date. These efforts generally led to the development of 
a total integrated system, or were used to convince the 
authorities to purchase a library automation software from 
the market. In some cases, the vendor of the library 
automation software helped to convert the automated data 
into a compatible form for use with the new software. Firms 
like LibSys undertook creation of bibliographic records of the 
library holdings. Both the MARC and CCF formats are used 
now to develop bibliographic databases. 


After the introduction of CDS/ISIS software packages 
of UNESCO in Indian libraries in the mid-1980s, NISSAT 
organized a number of training programmes on application 
of CDS/ISIS to library activities in 1990s. These course, 
besides training the professionals in using the software, have 
made them aware of the benefits of library automation and 
also introduced computer culture among the library 
professionals. This also gave impetus to many institutions 
for developing their own library software suitable for their 
libraries with special emphasis on the routines and services 
important to the institution. Thus began the indigenous efforts 
towards developing integrated library software packages in 
the country. 


30 Manual of Digital Libraries 


Some of the special libraries such as those at Bharat 
Heavy Electricals Ltd. (BHEL), Steel Authority of India Ltd. 
(SAIL), International Crops Research Institute for the Semi- 
Arid Tropics (ICRISAT), INSDOC, NIC, DESIDOC, and IIT, 
Kanpur have successfully developed software for library 
automation. Specific mention is to be made of DESIDOC, 
which has developed three software packages. The first was 
named the Defence Library Management System (DELMS) 
and was developed in COBOL under multi-user Unix 
environment. This software was provided to the DELNET, 


' and later to INFLIBNET under the name Integrated Library 


Management System (ILMS). While DELNET switched over 
to Libris and DELSIS subsequently, INFLIBNET has 
developed a new package Software for University Libraries 
(SOUL). DESIDOC also developed a software package for 
NISSAT called Sanjay, an integrated library management 
software using UNESCO’s CDS/ISIS in Pascal language. 
Later DESIDOC developed Suchika, an integrated software 
package in C++ language. Its first version was developed in 
DOS and Unix platforms to suit small and large libraries and 
later Suchika version 2 was also developed on Windows NT 
platform. Computer Maintenance Corporation Ltd (CMC), a 
public sector company specialized in computers, developed 
an integrated software package called Maitrayee suitable in 
library network environment. This software was developed 
with NISSAT support for CALIBNET. INSDOC initially 
developed CATMAN to support cataloguing in the National 
Science Library and Granthalaya, an integrated library 
automation software package. 


_ These institutional efforts inspired the software 
developing firms in private sector with the result that later 
many commercial software packages have been developed 
and marketed . LIBSYS, TROODON, LIBMAN, Alice for 
WINDOWS etc. are few to name. 


Prologue 31 
1.5.2. Library Automation Activites in India 


Although some institutions like ISI and IITs in the country 
have imported mainframe computers in the late 1950s and 
early 1960s, priority was being accorded for productivity-and 
R&D-linked jobs. This is because of the huge costs involved 
in getting mainframes and also due to the fact that library 
work was generally viewed as not so important by the 
concerned authorities who accorded lower priorities in allotting 
computer time for such work. Many of the libraries and 
information centers started using computers for their work 
after the introduction of mini computers during late 1970s. 
Even these were generally costly, only elite institutions in the 
public, academic, R&D and private sectors could afford them 
and so, the libraries in these institutions were able to utilize 
them to some extent. Library automation, as a result did not 
progress satisfactorily. However, the arrival of 
microcomputers and personal computers (PCs) in the Indian 
market in the 1980s gave the necessary impetus and the 
environment began to change and library automation picked 
up momentum. 


Indian National Scientific Documentation Centre 
(INSDOC), one of the pioneer institutions in library automation 
field, started using computers for information processing in 
1964 utilizing the IBM 1620 at IIT, Kanpur for its union 
catalogue. It also utilized the IBM 1620 AT Delhi University 
for other related jobs. Documentation Research and Training 
Centre (DRTC), Bangalore also started the computerization 
work in the late 1960s. A Document Finding System was 
designed and developed with programs to prepare catalogue 
on tape which was later tested on the IBM 1401 system at 
ISI, Calcutta. In 1970, the library of NAL, Bangalore made 
efforts in computerizing the circulation control with an ICL1004 
system. There were nine libraries which were using computers 
in the country. The various routines where computerized 
procedures used by these libraries include: procurement (one 


32 Manual of Digital Libraries 


library), charging and discharging of documents (one library), 
cataloguing (two libraries), preparing union catalogue (one 
library), and preparing addition lists (four libraries). INSDOC 
started providing computerized SDI service from January 
1976 using the IBM 370/155 computer at IIT, Madras and 
the CAN/SDI software with CA Condensates Database. 
INSPEC A&B databases were also used from 1977 for 
providing SDI services. In 1977 BHEL (R&D), Hyderabad 
started providing SDI services to the various units using 
computers. During 1970 a few more libraries started using 
computers for library routines. Notable among them include 
the Tata Institute of Fundamental Research (TIFR), Mumbai 
and the Space Applications Centre (SAC), Ahmedabad. A 
number of seminars and workshops were conducted on 
various facets of library automation during this period by 
national institutions like SIET, DRTC, BARC and INSDOC. 


This situation improved in the 1980s and the early 1990s 
with the launching of national and metropolitan networks. 
Further, during this period the prices of the computer hardware 
and software have started climbing down making them 
affordable to many libraries. Metropolitan networks like 
CALIBNET and DELNET, professionals associations like ILA, 
AGLIS and IASLIC, and national institutions like INSDOC, 
DRTC, and SIET started training programmes in automation 
of libraries, bibliographic database development using CDS/ 
ISIS and other software packages. National institutions like 
DRTC, INSDOC and DESIDOC, were actively engaged in 
such programmes. INFLIBNET, Ahemdabad started providing 
financial assistance to the academic libraries for library 
automation. Agencies like NISSAT also supported such 
activities. INFLIBNET has, supported 150 universities/ 
deemed universities towards creating infrastructure facilities 
including buying of PCs and modems, developing databases, 
and getting telephone and Internet connectivity. It has also 
provided recurring grants for some activities for 5 years after 


Prologue 33 


the initial grant is utilized. These efforts paid rich dividends 
and resulted in a significant level of automation of academic 
and research libraries in the 1990s. 


The main players in library automation in the past 
decade have been the special libraries of the country. Most 
of these library and information centers are in the R&D 
institutions under the central government and in universities. 
These include the Council of Scientific and Industrial Research 
(CSIR), Department of Atomic Energy (DAE), Defence 
Research and Development Organization (DRDO), 
Department of Science and Technology (DST), Indian Council 
of Agricultural Research (ICAR), Indian Council of Medical 
Research (ICMR), Indian Space Research Organization 
(ISRO), Public Sector Undertakings (PSUs) and the 
institutions of national importance like IITs, Indian Institute of 
Science (IISc), All India Institute of Medical Sciences (AIIMS), 
and National Medical Library (NML). Although special libraries 
took the lead initially, many university libraries from major 
institutions in arts, humanities, social and behavioural 
sciences, and management were increasingly participating 
in library automation. 


Some special factors favoured special libraries, which 
were able to undertake library automation. These include: (i) 
easier decision making due to the relative autonomy they 
posses being in publicly-funded organizations (ii) the pressure 
these libraries experience to provide efficient services and 
better, wider access to information (iii) the wide availability of 
PCs, and (iv) the free availability of UNESCO’s Micro CDS/ 
ISIS which facilitated easy development of databases. 
Another factor is that in many of the institutions, internal talent 
was available in the form of computer specialist who were 
responsible for the in-house development of library software. 


It is worth mentioning here that the 15" annual 
convention and conference on digital libraries was organized 


34 Manual of Digital Libraries 


by the Society for Information Science (New Delhi) at 
Bangalore during 18-20 January 1996. 31 papers on various 
aspects of digital libraries presented at the conference on 
the following topics: 


(i) Collecting, capturing, and filtering digitalized 
information. 


(ii) Cataloging, indexing, and processing. 
(iii) Networking systems and electronic access. 


(iv) Information professionals for digital informations 
system. 


(v) Critical views on effects and achievements. 
(vi) Social, economical and psychological implications. 
(vii) Indian scenario, problems and prospects, etc. 


Indian Institute of Science, Bangalore, which is 
considered a pioneer institute in India has established a digital 
library. This digital library used the IBM’s digital library 
software running on fast computers. IISc produces about 
1,000 papers and around 200 doctoral theses every year. 
These documents were made available through digital library. 
Journals were also available in the digitalized version.. 
Information sources in CD-ROMs were also available in this 
digital library. The efforts of digitalization of Indian petroleum 
industry and use of electronic communication for geoscientific 
data and information sources dates back to the seventies. 
Library and information units of Indian petroleum industry, 
especially the Oil and Natural Gas Commission, have evolved 
various- specialised bibliographic database, online public 
access Catalog of holding and retrieval services. 


All above factors and advancement of Internet and 
world wide web (WWW) paved the way for establishing digital 
or automated libraries at various levels in India. But it was 
the only last quarter of 20" century which could be recognized 


Prologue 35 
as the starting era for digital library in India. 


Since 2001, many project on imitation of Digital Libraries 
have been started which are flourishing well. A detailed 
account on some of them is given in the last chapter of this 
book. 


Summarily, it can be said that Digital Libraries and DL 
Development are new avenues of exploration for 
professionals to deliver content directly to end users. Libraries 
and librarians are constantly seeking to develop new 
resources that serve their users, especially if they are 
convenient and flexible in nature. The marketplace for digital 
library development is nascent and, with time, will grow to be 
a worthy contributor to the overall marketplace for information 
service. 


But, what does the future hold? What will the library of 
the future look like? Or will it even exist at all? The simple 
answer is yes, it will exist. It will most likely still be similar to 
what is today-a carefully thought out mix of electronic and 
paper resources. Over time products will find their own niche, 
or more precisely, librarians will figure out what products are 
better in print and what products are better electronically, 
striking the appropriate balance between the two. Dr. S.R. 
Ranganathan classic five laws of library science as the guiding 
spirit behind architecting and managing the libraries as 
knowledge supermarkets also support the future. The same 
may be rephrased as given below with somewhat different 
relative emphasis to guide us in architecting and managing 
Digital Information Systems of the 21* century. 


e Digital Resources are for use. 
e Every User seeks Digital resources. 
e Every digital resources needs its user. 


° Save the time of the user. 


36 Manual of Digital Libraries 
e Digital library is a growing organism worldwide. 


So, the future does not look too much different from 
the present. And, surely, we are not moving any closer to the 
“paperless society” that everyone is talking about. In fact, we 
are moving farther in the opposite direction every day. 


What will the role of the librarian be in all of this? Will 
librarians even exist at all? The simple answer, again, is yes, 
definitely. Librarians will continue to be able to do and provide 
more for users than ever before with the advantages provided 
by virtual libraries. We will continue to work towards providing 
users with seamless, organized access to virtual library 
resources. Perhaps we will even start to push the envelope 
and become innovators in the use of non-traditional training 
and reference services. Who knows? As long as librarians 
continue to share with one another, both informally and on 
Internet discussion lists, and at conferences, the opportunities 
and possibilities are ours for the taking. 


2 
Digital Libraries — The 
Concept Elaborated 


The change in a society calls for a change in attitude of 
the managements, technologists, institutions and users. 
Again, the innovations are not readily accepted by the 
community at once due to lack of awareness, expertise, cost 
benefit analysis and suitable end-user cost. Previously, 
libraries had to depend largely on their own staff to prepare 
in-house catalogue cards. The need for standardization, even 
in the manual age, led to AACR—2 (Anglo-American 
Cataloguing Rules-2), and later to the ISBD (International 
Standard for Book Description) format. Computerization of 
these bibliographic descriptions led to a further requirement 
for standards of bibliographic descriptions for MAchine- 
Readable Catalogue formats (MARC). This requirement was 
borne for an “exchange” medium for bibliographic data. 
Computerization initially took place in large libraries for 
management convenience. A centrally located store of 
bibliographic records in machine readable format could be 
used as a resource by many libraries. 


Library automation systems became firmly established 
and recognized as a beneficial technology for the librarian. 
As computer power increased with a reduction in its prices, 
the automation providers increased the scope of the library 
automation functions. This led to the integrated library 


38 Manual of Digital Libraries 


management system (ILS). A typical ILS system provides a 
cataloguing moduie, OPAC, circulation control module, 
purchasing module, serials management module, import 
module and reports module. Others may also include facilities 
for inter-library loans and other things. These systems enable 
library staff to perform almost all of their functions “on-line”, 
often meaning that data entered in one part of an integrated 
system can be used again elsewhere, thus saving time and 
money and further ensuring accuracy. 


As the technology advanced, so did the library 
automation systems. The main focus was on improving the 
ability for the borrower to retrieve information from the library 
automation system. Command driven retrieval was replaced 
by menu-driven retrieval. OPAC terminals were set up in 
libraries with options to perform simplified search strategies. 
The majority of systems were either on mainframe computers 
or on vendor specific hardware. Access to the database was 
through an OPAC terminal in the library. 


The next advancement was to enable desktop computer 
users access to the library over the organization-wide network. 
This means that querying of the library database could be 
done remotely. Hitherto libraries had been running and in 
some cases still are running, a suite of electronic online 
services for their patrons. There was access to the local 
catalogue. Online service provision was accessible by trained 
information scientists. Subject specialist libraries would often 
have a CD-ROM terminal set up to enable users to perform 
more specific content searches. 


The further advancement in computer and 
telecommunication technologies has resulted in the 
emergence of global village, which is characterised by the 
availability of electronic publication all over the world. IT or 
Information Technology has ushered in a variety of media 
that can help library and informational professionals in efficient 


Digital Libraries — The Concept Elaborated 39 


and effective acquisition, organization and dissemination of 
information. Computers and CD-ROM's have found 
increasing acceptance in library and information centres 
(LICs); multimedia has shown much potential for LICs; and 
information networks have broken both time and space 
barriers to a great deal. At the same time, library and 
information professionals are deluged with advice as how to 
use this media to acquire and organize various learning 
resources and satisfy the complex and ever increasing 
information needs to their users. 


The electronic media is helping libraries to increase their 
efficiency and effectiveness by providing information in all 
dimensions. The libraries need not own publications to be 
readily accessible. Many of the sources available in electronic 
form can be made available to the user community by 
developing electronic information access (EIA) facilities. 


The library as a primary institution for storing and 
disseminating information has, therefore, to take advantage 
of information technology and facilitate access to whatever 
information is needed and wherever it may be. For this 
purpose, it must modernize its operations and services with 
the help of information technology and have internet 
connectivity and on line information retrieval facilities. 


The point is, that all of these services were in almost all 
cases being offered from different points of access. In fact, it 
has proved to be the system departments that have indirectly 
led to changes in the way the technology has affected libraries. 
Technology-led solutions gradually became popular and got 
widely used. 


The internet has emerged as the most powerful medium 
for storage and retrieval of information. With an 
unprecedented growth in the quantum of knowledge world 
wide and their easy accessibility, it has become an 
unavoidable necessity for every library and information 


40 Manual of Digital Libraries 


centers. Internet is a global network of computer networks 
which has opened up unimaginable opportunities for storing, 
manipulating and disseminating of textual, data as well as 
the multimedia. 


Librarians have not been slow to react to the enormous 
potential of the Internet as resource provider. Above all, 
librarians, like any other professionals, have to justify their 
service to their management in terms of quality and cost. 
Further, web technologies and database technologies have 
compelled library and information centers to use these 
technologies effectively to render services. With the growing 
number of e-resources, it has become imperative for 
information professionals to redefine their role in 
disseminating information to the users. The idea of converting 
library materials into digital formats for creating digital 
collections has advanced rapidly in the last few years, thus 
leading to the concept of a virtual library or a library without 
walls or so called digital library. 


Simulataneously, the fundamental concept of the 
classroom learning is also gradually changing to telelearning. 
Computer and related electronic resources have come to play 
a central role in education. These resources have proved to 
be of immense value in teaching of the subjects. The use of 
electronic media is wide spreaded all over the world. As a 
result of digital revolution, internet web page, email is the 
normal form of communication between the teachers and the 
students. Electronic tools make the classes more efficient, 
lectures more informative, and reading assignment more 
extensive and accessible. The lectures are presented via 
slide, statistical charts and tables, images, power point slides, 
graphics and videoclips etc. Virtual libraries are available on 
the net to cater to the needs of the students. They can visit 
any virtual library and acquire any kind of information they 
want from around the world with the click of a button. This is 


Digital Libraries — The Concept Elaborated 41 


the emerging scenario of educational environment and of 
libraries. 


2.1. DEFINITIONS 


A library is more than a pile of books. A library adds 
value to information resources by organizing them and making 
them available. Additionally, a library serves distinct sets of 
stakeholders— communities of frequent, casual, and potential 
users. Unlike museums, it is seldom the materials in libraries 
that attract people but rather the ideas carried by the materials, 
the conceptual structures that support access, and the 
community of stakeholders who use the library. Because 
books and other physical information resources and people 
occupy physical space, libraries have evolved complexes of 
buildings, rooms, and mobile spaces in which books and other 
materials and people come together. These spaces are 
manifestations of the library as place. Place, however, is more 
than physical space - just as a home is more than a house. 
Places are defined by functions and communities, just as are 
the libraries. Places stimulate and can represent states of 
mind— it is easy to understand that when someone says “l 
feel out of place,” they are not only referring to the physical 
space they occupy. Thus, places are as much about ideas 
and states of being as they are about physical space. This 
physical-conceptual continuum parallels what we mean by 
libraries— places that marry physical space with intellectual 
space, to link people to ideas and to each other. 


These fundamental characteristics of libraries - 
systematic access to information resources, the ideas 
represented by those resources, and sets of human 
stakeholders - also extend to digital libraries. Marchionini and 
Fox argue that digital libraries are the extensions and 
augmentations of physical libraries. We suggest that with 
respect to this broader notion of place both physical and digital 


42 Manual of Digital Libraries 


libraries are instantiations in different media of the same base 
type— the library. In either case, both physical and digital 
libraries occupy the physical-conceptual continuum with 
respect to ideas, materials, and the people. The distinction 
between physical and digital libraries is thus not always a 
clear one. For the sake, physical libraries are considered to 
maintain a collection of exclusively physical materials, while 
digital libraries are considered to maintain a collection of 
exclusively electronic materials. In between these two 
extremes is the more typical physical library that maintains 
digital components, such as digitized representations of 
physical materials in its collection, or subscriptions to 
databases or other electronic resources-- what Buckland calls 
the “automated library” and Rusbridge calls the “nybrid 
library.” It is argued that digital libraries are extensions toward 
the conceptual side of this continuum, and that the incorporeal 
nature of digital libraries makes them suitable replacements 
for physical libraries in meeting people’s information needs. 
Then, what is a Digital Library? Now let us try to define digital 
libraries. 


There are many buzz-words for it, sometimes even 
referring to remotely related activities but not limited to: 
multimedia database, information mining, information 
warehouse, information retrieval, on-line information 
repositories, electronic library, operational image applications, 
imaging, World Wide Web (WWW) and Wide Area Information 
Services (WAIS). 


There are many definitions of a “digital library.” Terms 
such as “electronic library’ and “virtual library’ are often used 
synonymously. The elements that have been identified as 
common to these definitions are: the digital library is not a 
single entity; the digital library requires technology to link the 
resources of many; the linkages between the many digital 
libraries and information services are transparent to the end 


Digital Libraries — The Concept Elaborated 43 


users; and digital library collections are not limited to 
document surrogates rather they extend to digital artifacts 
that cannot be represented or distributed in printed formats. 


Hence Digital Library or Electronic Library is a library 
which exists solely in electronic form not on paper. The 
building blocks required for such a library may not exist, and 
the chemical steps for such a library may not have been 
tested. These libraries are used in the design and evaluation 
of possible libraries.These libraries provide access to 
electronic information in a variety of remote locations through 
a local online catalogue or other gateway, such as the internet. 
Electronic or digital library can also be called as an annotated, 
frequently updated subject guide to online resources. There 
are other definitions such as: 


Nurnberg aptly states that: 


e From an information retrieval point of view, it is a large 
database. ; 


e For people who work in hypertext technology, it is one 
particular application of hypertext methods. 


e For those working in wide-area information delivery, it 
is an application of the Web. 


e And for library science, it is another step in the 
continuing automation of libraries that began over 25 
years ago. 


There is no doubt that digital library is all of the above. 
Will this be enough? Not really. Cleveland provides some 
working definitions. Digital libraries are libraries with the same 
purposes, functions, and goals as traditional libraries- 
collection development and management, subject analysis, 
index creation, provision of access, reference work, and 
preservation. A narrow focus on digital formats alone hides 
the extensive behind-the scenes work, that libraries do 


44 Manual of Digital Libraries 


develop and organize collections and help users to find 
information. The following characteristics of digital library may 
support this working definition: 

e Digital libraries are the digital face of traditional libraries 
that include both digital collections and traditional, fixed 
media collections. So they encompass both electronic 
and paper materials; 

e Digital libraries will include digital materials that exist 
outside the physical and administrative bounds of any 
one digital library; 

e Digital libraries will also include all the processes and 
services that are the backbone and nervous system of 
libraries. However, such traditional processes, though 
forming the basis digital library work, will have to be 
revised and enhanced to accommodate the differences 
between new digital media and traditional fixed media; 


e Digital libraries ideally provide a coherent view of all of 
the information contained within a library, no matter its 
form or format; 


e Digital libraries will serve particular communities or 
constituencies, as traditional libraries do now, though 
those communities may be widely dispersed throughout 
the network; and 


e Digital libraries will require both the skills of librarians 
as well as those of computer scientists to be viable. 


The concept of Digital Libraries is evolving over time. 
Moreover different communities are active in the area of 
Digital Libraries with competing visions. Borgman point outs 
the term digital library is used in at least in two senses: 


— In the computer science research community digital 
libraries are viewed as content collected on behalf of 
USELS. 


— Inthe library practitioner community digital libraries are 


Digital Libraries — The Concept Elaborated 45 


seen as /nstitutions providing a range of services in a 
digital environment. 


While most of digital library projects falls into the first 
category, the speculation about the future developments 
concentrates on versions of the second. 


The NSF/ARPA/NASA Digital Library Initiative, states 
a digital library as: 


“Information sources accessed via the Internet are 
ingredients of a digital library. Today, the network connects 
some information sources that are a mixture of publicly 
available (with or without charge) information and private 
information shared by collaborators. They include reference 
volumes, books, journals, newspapers, national phone 
directories, sound and voice recordings, images, video clips, 
scientific data (raw data streams from instruments and 
processed information), and private information services such 
as stock market reports and private newsletters. These 
information sources, when connected electronically through 
a network, represent important components of an emerging, 
universally accessible, digital library.” 


This definition asserts inclusion of all network accessible 
information in the digital libraries. Moreover digital libraries 
are emerging and universally accessible. However this 
definition does not tell about all components of the Digital 
Library. 


According to Gladney H.M, et. al.: 


“A digital library service is an assemblage of digital 
computing, storage, and communications machinery together 
with the software needed to reproduce, emulate, and extend 
the services provided by conventional libraries based on paper 
and other material means of collecting, storing, cataloguing, 
finding, and disseminating information.” 


46 Manual of Digital Libraries 


According to them a digital library is a machine-readable 
representation of materials, which might be found in 
conventional library. Along with this representation, organising 
information is also available to assist users in finding specific 
information. 


Association of Research Libraries (ARL) has identified 
the following five elements in various definitions of the digital 
libraries: 


— The digital library is not a single entity; 


— The digital library requires technology to link the 
resources of many; 


— The linkages between the many digital libraries and 
information services are transparent to the end users; 


— Universal access to digital libraries and information 
services is a goal; 


— Digital library collections are not limited to document 
surrogates— they extend to digital artefacts that cannot 
be represented or distributed in printed formats. 


This definition introduces the concept of distributed and 
linked resources. The digital resources are the collections 
and information services. The digital collections are - digital 
surrogates, non-printable objects and digital artefacts that 
cannot be distributed in print form. Paul Duguid has defined 
the Digital Library as an environment to bring together in 
support of life cycle of information in addition to digital 
collection and information management tools. 


The concept of a “digital library” is not merely equivalent 
to a digitized collection with information management tools. 
It is rather an Environment to bring together collections, 
services, and people in support of the full life cycle of creation, 
dissemination, use, and preservation of data, information, and 
knowledge. 


Digital Libraries — The Concept Elaborated - 47 


More recent definition is given by Deegan and Tanner. 
They have given a set of defining principles that seem 
unarguable: 


e  Adigital library is a managed collection of digital objects. 


° The digital objects are created or collected according 
to the principles of collection development. 


e The digital objects are made available in a cohesive 
manner, supported by services necessary to allow users 
to retrieve and exploit the resources just as they would 
any other library materials. 


° The digital objects are treated as long term stable 
resources and appropriate processes are applied to 
them to ensure their quality and survivability. 


But the meaning of the term “digital library” is less 
transparent than one might expect. The words conjure up 
images of cutting-edge computer and information science 
research. They are invoked to describe what some assert to 
be radically new kinds of practices for the management and 
use of information. And they are used to replace earlier 
references to “electronic” and “virtual” libraries. 


The partner institutions in the Digital Library Federation 
(DLF) realized in the course of developing their program that 
they needed a common understanding of what digital libraries 
are if they were to achieve the goal of effectively federating 
them. So they crafted the following definition, with the 
understanding that it might well undergo revision as they 
worked together: 


Digital libraries are the organizations that provide the 
resources, including the specialized staff, to select, Structure, 
offer intellectual access to, interpret, distribute, preserve the 
integrity of, and ensure the persistence over time of collections 
of digital works so that they are readily and economically 


48 Manual of Digital Libraries 


available for use by a defined community or set of 
communities. 


This is a full definition by any measure and a good 
working definition because it is broad enough to comprehend 
other uses of the term. Other definitions focus on one or more 
of the features included in the DLF definition, while ignoring 
or de-emphasizing the rest. For example, the term “digital 
library” may refer simply to the notion of collection, without 
reference to its organization, intellectual accessibility, or 
service attributes. This is the particular sense that seems to 
be in play when we hear the World Wide Web described as a 
digital library. But the words might refer as well to the 
organization underlying the collection, or, even more 
specifically, to the computer-based system in which the 
collection resides. The latter sense is most clearly in use in 
the National Science Foundation’s Digital Libraries Initiative. 
Yet again, institutions may be characterized as digital libraries 
to distinguish them from digital archives when the intent is to 
call attention to the differences in the nature of their 
collections. 


The DLF’s definition of “digital library” does more than 
simply enumerate features. It serves in addition as the basis 
for the DLF’s perspective on the scope of digital libraries and 
on the functional requirements for their development. A brief 
consideration of certain features of the definition will help to 
explain its significance to the DLF. 


Organizations that provide the Resources : Digital 
libraries are organizations that employ and display a variety 
of resources, especially the intellectual resources embodied 
in specialized staff, but they need not be organized on the 
model of conventional libraries or even within the context of 
conventional libraries. Though the resources that digital 
libraries require serve functions similar to those within 
conventional libraries, they are, in many ways, different in 


Digital Libraries — The Concept Elaborated 49 


kind. For example, for storage and retrieval, digital libraries 
are dependent almost exclusively on computer and electronic 
network systems and systems-engineering skills rather than 
on the skills of traditional cataloguers and reference librarians. 


Far from emulating the organization of conventional 
libraries, the organization and structure of digital libraries, and 
the division of labour within them, are open to considerable 
experimentation. For example, as publishers and professional 
societies disseminate works electronically, they are testing 
how far their investments should incorporate the full range of 
library functions. When digital libraries license content from 
publishers and professional societies that manage their own 
repositories, they are, in effect, outsourcing the library storage 
function and experimenting with distributed repositories. 
Further, new organizations appear regularly in the form of 
small, entrepreneurial, cottage-like industries that scholars, 
laboratories, and others have developed to create, manage, 
and disseminate bodies of digital information critical to a 
discipline or set of disciplines. 


Preserve the Integrity of and ensure the Persistence: 
Each of the functions enumerated in the working definition of 
“digital library"— select, structure, offer intellectual access, 
interpret, distribute, preserve integrity, and ensure 
persistence—are subject to the special constraints and 
requirements of operating in a rapidly evolving electronic and 
network environment. The continual change in the 
environment means that the latter two functions, preserve 
integrity and ensure persistence, are especially difficult to 
achieve. But the DLF regards these functions as central to 
the concept of digital library and follows the Task Force on 
Archiving of Digital Information in identifying them as linked 
but distinct. The task force argued that the integrity of digital 
objects is measured in terms of content, fixity, reference, 
provenance, and context. But it argued as well that the 
preservation of object integrity, though necessary, is not a 


50 Manual of Digital Libraries 


sufficient condition of persistence. Persistence depends on 
other factors as well— organizational will, financial means, 
and the negotiation of legal rights. 


Collections of Digital Works : Distinctions among 
libraries commonly focus on the subject matter that defines 
the collections (e.g., medical, art, science, music, and such) 
or on the communities interested in the collected materials 
(e.g., research, college, public). The DLF is convinced that, 
as digital libraries mature, the principle defining their collection 
policies will not be the “digital-ness” of the material. Rather, 
the defining principles will be, as in other libraries, the subject 
matter of the materials and the patron community interested 
in them. The key strategic question for digital libraries 
anticipating such a development will be how to integrate 
collections of materials in digital form with materials in other 
forms. 


Readily and Economically available : Like other 
organizations, digital libraries need to develop criteria for 
measuring their performance in an evolving and highly 
competitive environment. They must reflect the functional 
attributes of a digital library at a minimum. One essential 
measure of the quality of service evaluates performance in 
terms of cost. Although the costs of digital library service are 
not yet well understood, the DLF appreciates that successful 
digital libraries have a sure grasp of critical cost factors and 
work quickly to economize the influence of those factors. A 
second essential measure of service quality takes account 
of how willingly and how responsibly a digital library makes 
information available to its patron communities. 


Use by a Defined Community or Set of Communities: 
Libraries in general, and digital libraries in particular, are 
service organizations. The needs and interests of the 
communities they serve will ultimately determine the trajectory 
of development for digital libraries, including the investment 


Digital Libraries — The Concept Elaborated 51 


they make in content and technology. Most of the libraries in 
the DLF are dedicated to supporting higher education and 
research, and they justify their investment in digital 
developments as a powerful means of realizing the larger 
institutional goals of the academic communities they serve. 


So we can say that a digital library is a library in which 
a significant proportion of the resources are available in 
machine-readable format as opposed to print or microform, 
accessible by means of computers. The digital content may 
be locally held or accessed remotely via computer networks. 
The process of digitization began with the catalogue, moved 
to periodical indexes and abstracting services, then to 
periodicals and large reference works, and finally to book 
publishing for providing access to its users in these libraries. 


2.2. CHARACTERISTICS OF DIGITAL LIBRARIES 


Basically there are three important features which 
characterise digital libraries : 


1. Documents: Digital library collections contain fixed, 
permanent documents; 


2. Technology: Digital libraries are based on 
technology; and 


3. Work : Digital libraries are to be used by individuals 
working alone. Let us try to describe the characteristics of 
the digital library and to do so independently of the library 
environment— academic, national, public or commercial. 


e Access to the digital library is not bounded in space or 
time. It can be accessed from anywhere at any time. 


e Content in electronic form will steadily increase and 
content in printed form will decrease. 


e Contents may be in textual, image, and sound form. 


52 


Manual of Digital Libraries 


Usage of electronic information as a proportion of total 
usage will steadily increase, and usage of printed 
material as a proportion of total usage will decrease. 


Expenditure on electronic material will steadily increase 
and, relatively, expenditure on printed material will 
decrease. 


Expenditure on information will shift from ownership to 
subscription and licensing. 


Expenditure on equipment and infrastructure will 
increase. 


Usage of buildings will shift from stockholding to places 
for study, animation and citizenship. 


Jobs, training and recruitment will have to be re-profiled. 


Chodhury and Chodhury have jotted down following 


major characteristics : 


Variety of digital information resources. 
Digital Libraries reduce the need for physical space. 
Users at remote. 


Users may build their own personal collections by the 
facilities provided by Digital Library. 


Provide access to distributed information resources. 


Same information resource can be shared by many at 
the same time. 


Paradigm shift both in use and ownership. 


Collection development is based on potential 
usefulness and appropriate filtering mechanisms be 
followed to negotiate the problem of plenty. 


Ability to handle multilingual contents. 


Presupposes the absence of human intermediaries. 


Digital Libraries — The Concept Elaborated 53 
° Should provide better searching and retrieval facilities. 


° Digital information can be used and viewed differently 
by different people. 


o Digital Library breaks the time, space and language 
barrier. 


Thus, a digital library is expected to possess above 
features to provide better services to its user in global 
environment. The focus of libraries, therefore, should be the 
acquisition, organisation and dissemination of knowledge and 
information, rather than the medium. With digitalisation of 
knowledge and information, both have become fluid and 
separated from the container. As such, there is every 
likelihood of change in the role and status of traditional library. 


With the advent of digital technology, the possibility of 
horizontal and vertical integration of knowledge and 
information has increased significantly. Historically, 
communication of knowledge and information has passed 
through three phases, i.e., speech, script, and printing. 
‘Information Superhighway’ as an outcome of ‘electronic- 
skywriting’ is the fourth revolution which has provided us the 
missing link for instant transfer of knowledge and information. 


Now it is possible to transmit large quantities of voice, 
video and data across the globe instantly. Now, digital 
information resources include not only rapidly growing 
collection of electronic full text resources, but also images, 
video, sound, and even objects of virtual reality. The most 
significant shift in building digital collections is greater 
interoperability among information systems across the country 
and internationally. With the technology available at an 
effordable cost, the libraries are initiating small digitisation 
projects as individual library or as a group of libraries. 


However, building-up digital collection and infrastructure 
required to access them is a challenge that every library has 


54 Manual of Digital Libraries 


to deal with. Today’s digital libraries are built around Internet 
and web technologies with electronic journals as their building 
blocks. The increasing popularity of Internet and 
developments in web technologies are catalyst to the concept 
of digital library. Fig. 2.1 is a pictorial representation of digital 
library infrastructure and services that can be generated from 
them for the benefits of instant users of information. 


Technological 
And Cultural 


Modified Web-based 
Library Services 


Collection 
Infrastructure 


*OPAC to WebPAC 
+CDROM to Web Databases 
*Maanal to Digital Reference 


and Access 


Service 
-Manual to Real-time Digital 
Reference Service 


+Manwal to Flectronic 
Document Delivery Service 


Access 
Infrastructure 
Search Browse, ISR 
Portals 


Computing & 
Network 
tnfrastructure 


spare YZ 
i Digital Resources f} 
A 
XZ, 
SOD ef 


New Web-based 
Library Services 


Virtual Library Tours 
Library Web Sites 
‘Library Portals 
*Web-based User Education 
*FAQ 

*Library Calender 

“Web Forms 

*Rolletin Boards, Discussion 
Forem and Lhtiervs 


Digital Resource 


Organization 
Standards Protocols 
Access Control 


Fig. 2.1. Digital Library Infrastructure and Services 


And it is hoped, as libraries continue in an era of 
constant change under pressure to deliver value-added 
services while continuously improving the quality of these 
services, they would do well periodically to rethink their core 
values and to bring into awareness new values that match 
users. Direct user access to information in digital format and 
the provision of essential services through computer network 
environments are two powerful emerging phenomena for 
which librarians necessarily must evolve a set of values that 
will shape services in libraries for the next several years. 
Continued integration of paper and electronic technologies, 


Digital Libraries — The Concept Elaborated 55 


creation and support of holistic computing environments, 
delivery of reference and instructional services over the 
network, special efforts to make the technology work for all 
users, and partnering across administrative lines build on the 
traditional reference values of personal service and equity of 
access in support of more contemporary notions of direct user 
access to information and services in a networked 
environment, need to be watched and improved regularly. 


2.3. FRAMEWORK FOR DIGITAL LIBRARY 


A framework for thinking about the notions of place and 
library is provided then the issue of materials and the ideas 
they represent, looking at physical space, collection 
development, preservation, maintenance, and reference 
services in turn are considered. Places for people are 
considered next, including issues of people’s sense of place 
in physical and digital spaces. Finally, the issues of physical 
and digital spaces as places for work, collaboration, and 
community-building are taken. Physical and digital libraries 
in order to sharpen the similarities as well as differences are 
justaposed here. The goal is to demonstrate that in many 
ways digital libraries really are places in the conceptual sense, 
and will continue to broaden and enrich the roles that libraries 
play in people’s lives and in the larger social milieu. 


Kalay and Marx suggest that place-making is a 
deliberate process involving “arranging or appropriating 
objects and spaces to create an environment that supports 
desired activities, while conveying the social and cultural 
conceptions of the actors and their wider communities”. Based 
on this conception of place, we postulate that there are three 
key elements for thinking about place, and specifically about 
libraries as places: (1) the physical-conceptual continuum, 
(2) the people who hold stakes in the place, and (3) the 
functionalities that bring people to the place. By the physical- 
conceptual continuum, we mean the range of concreteness 


56 Manual of Digital Libraries 


of a space— from the actual physical reality one finds one’s 
body in, to a physical space that reminds one of another 
physical space, to simulations of actual physical spaces that 
media such as television and film evoke, to representations 
of imagined physical spaces that new media such as virtual 
reality evoke, to the mental places we can imagine. The 
stakeholders in a library include the users, the librarians, the 
library administration, the town government or institutional 
management, the community at large, and others. The key 
functionalities for libraries as place are— the selection of ideas 
as manifested in materials for inclusion, the preservation of 
these ideas, and the creation and use of organizational 
structures to support access. 


The concepts of place and library both have physical 
and conceptual senses and that we can consider a library as 
a place either at the literal or at the conceptual extremes of 
the continuum can be claimed in this framework. The people 
who have a stake in the library as place are likely to be similar 
whether the library is physical or digital, but some people may 
adopt exclusive stakes in either. Because many physical 
libraries have digital components, and digital libraries are often 
associated with physical libraries, we believe that this overlap 
in stakeholders is significant. Ideas are not space dependent 
but are manifested in materials that require matter or energy. 
It is refered to the manifestations that require mainly matter 
as physical and those that require mainly energy as virtual or 
digital materials. The idea or the ‘work’ from its manifestation 
in material form is distinguished here. These artifacts may 
be physical or virtual. We use the term ‘material’ to emphasize 
that these manifestations are made by humans and are not 
naturally occurring. Finally, the stakeholder interacts with the 
library as place in what we term an experience. It is postulated 
that physical and digital libraries differ most in what types of 
materials are selected for inclusion, and largely overlap in 

the ideas that are manifested by those materials. Whether 


Digital Libraries — The Concept Elaborated 57 


the form of expression is physical or digital in turn strongly 
impacts selection, preservation, access, and the user 
experience. These overlaps and distinctions provide ways to 
think about physical and digital library places and in turn how 
people experience libraries. 


Figure 2.2 presents a view of the library as a place, 
which encompasses both physical and digital libraries. The 
focus of this model is the space itself, whether a physical or a 
digital space. The space must exist prior to any of the other 
elements of this model coming to pass— prior to materials 
being collected and stored in it, and prior to people coming to 
it and experiencing it. Materials, whether physical or digital, 
are brought into the space through the process of collection 
development, and the space functions as storage for these 
materials. Additional value is added to the space through 
organizational structures that facilitate access. Materials must 
be brought into the space prior to users coming to the space, 
since, without these materials, there would be no reason for 
users to use the space except as a meeting place, which is 
not sufficient to distinguish a library from many other social 
spaces. Users then come to the space— physically or virtually 
to use the materials stored in it. This use, situated in the space, 
causes the user to have an experience of the space, the 
materials, any other library stakeholders present, and the 
ideas embodied in the materials. Underlying the rest of this 
model is the fact that the space may be either physical or 
virtual - that is the library may be either a physical or a digital 
library, which in turn dictates certain characteristics of the 
other elements of the model, such as the format of materials 
stored in the library, the media by which users may “come to” 
the library, and the experience that users are likely to have of 
the library. Presumably, the overall experience leads to new 
ideas that may themselves in turn become part of the library 
space. 


58 Manual of Digital Libraries 


Ideas i 


Materials 
Physical Virtual 


Space 


Physical © Virtual pl geben 


Stakeholder(s) 


Fig. 2.2. Model of the Library as Place 


Lakoff and Johnson discuss the metaphors that we use 
and take for granted in our everyday use of language. They 
divide these metaphors into two types— structural metaphors, 
in which one concept is portrayed in language in terms of 
another concept (e.g., argument in terms of war); and 
orientational metaphors, in which a concept is portrayed in 
spatial terms (e.g., happiness in terms of up). They argue 
that these metaphors are not merely figures of speech, but 
have their basis in our everyday experience, and in turn deeply 
influence our conceptualizations of the concepts to which they 
are applied. The idea of the digital library as place is, to a 
certain extent, an orientational metaphor— a library building 
is a physical place, but a digital library is not. There is nothing 
inherently spatial about a digital library, or more fundamentally 
about the information objects contained in a digital library’s 
collection. Users of digital libraries may perceive their 
interaction with information in spatial terms, as users of 
physical libraries are forced to interact with information in 
spatial terms due to the layout of the library itself. This 
orientational metaphor of the digital library as a space - and 
further, of the “information space” - is one way to think about 
the physical-conceptual continuum for places and libraries, 


Digital Libraries — The Concept Elaborated 59 


and the farther toward the conceptual side of the continuum 
we get, the more important metaphors become for guiding 
actions in the space. 


2.3.1. Spaces for Materials 


Space is an important characteristic of a physical library; 
for example, the amount of floor space and shelf space is 
critical. Too little space may, for example, prompt the library 
to install compact shelving to maximize the use of existing 
space, to build an addition onto the building to expand existing 
space, to move materials into an auxiliary space such as off- 
site storage, or to build a new building to create an entirely 
new space. The issue of space is important in the physical 
world because the amount of material that can exist in a 
physical space is constrained by the size of the space and 
the size of the materials. Thus, physical space strongly 
influences most physical library functions. 


This is of course not the case for virtual “space” and 
electronic materials. Electronic materials occupy no physical 
space - or, if electronic materials can be said to occupy any 
physical space at all, they occupy the space required for the 
disk array on which they reside. The benefit of electronic 
materials, from the perspective of physical space, is thus that 
a quantity of physical materials that would occupy a large 
physical space may be stored in a much smaller physical 
space, if converted to an electronic form. For example, as of 
this writing, high-end desktop computers come standard with 
hard drives of up to 800 GB capacity, which is a quantity of 
data equivalent to over eight thousand meters of shelved 
printed textual materials. Moore’s law states, loosely, that the 
amount of processing power that can be contained on a 
microchip doubles every 18 months, and a corollary to 
Moore's law states that the capacity of computer memory 
systems increases at approximately the same rate. Thus, not 
only can a greater quantity of materials be stored electronically 


60 Manual of Digital Libraries 


than physically in the same physical space, but this quantity 
of electronic materials is ever-increasing. 


Over eight thousand meters of shelving is the physical 
space that would be occupied by these hypothetical physical 
materials if these materials were stored locally, and the space 
required for a disk array is the physical space that would be 
occupied by the corresponding electronic materials stored 
locally. The location of storage, then, raises an additional issue 
regarding space— materials stored locally take up space 
locally, but materials stored remotely, from the user’s point 
of view, occupy no space at all. For example, if one wants a 
specific book that is not held in one’s local library’s collection, 
all one needs to do is to place an interlibrary loan request- fill 
out a form, a librarian places a request with the interlibrary 
loan consortium of which one’s local library is a member, and 
some remote library ships the book to the local library. The 
book requested on interlibrary loan exists in the world and 
occupies space somewhere, but it occupies no space in the 
local library. Thus, when the book arrives, one has gained 
access through the local library to material that occupies no 
space in the local library. 


Interlibrary loan raises the issue of ownership versus 
access in physical libraries, and demonstrates that it is not 
necessary for a library to own materials in order to provide 
access to them. The role of place where materials are 
physically stored is one of the traditional roles of the physical 
library. To fulfill this role, the physical library had to own these 
materials, and this came with all of the responsibilities of 
ownership-— the initial decision of what is worth owning, and 
the long-term care and curation of those materials. Physical 
libraries have in the past few decades been experiencing a 
shift towards increasing access rather than ownership. This 
shift has come as libraries have subscribed to publisher- 
provided and third-party databases to gain access to journals 


Digital Libraries — The Concept Elaborated 61 


in electronic format that the library may not subscribe to in 
print, and as libraries have subscribed to services such as 
NetLibrary (www.netlibrary.com) which make the full text of 
books available electronically that libraries may not own in 
print. This trend will likely continue to accelerate due to efforts 
such as the Google Books Library Project (books.google.com/ 
googleprint/library.html) and the Open Content Alliance 
(www.opencontentalliance.org). There has been considerable 
backlash in recent years in the library and academic 
communities against journal publishers in response to the 
“serials crisis” of increasing costs for consistent or even 
decreasing access. Whatever the future of library-publisher 
relations holds, however, as increasing numbers of books 
and journals are published electronically, libraries will 
increasingly provide access to materials that they do not own. 
Digital libraries too own materials, as do physical libraries 
that digitize components of their collections. Though these 
materials are in electronic formats, the issues of collection 
development and maintenance remain consistent. Thus, while 
providing access to materials owned by another institution is 
not a new role for physical libraries, the trend in physical 
libraries is to move further from the traditional role of the library 
as owner and curator of materials. Digital libraries, and 
physical libraries that maintain digital collections, on the other 
hand, continue to fulfill this more traditional role for the library. 


It may take from a few days to a few weeks to gain 
access to a book requested via interlibrary loan, because itis 
stored remotely from the local library. Electronic materials 
stored remotely, on the other hand, are accessible far more 
rapidly; even with the slowest dial-up connection, accessing 
electronic materials stored remotely takes on the order of 
minutes, not days or weeks. Thus, the location of the space 
in which physical materials are stored affects the time it takes 
to access those materials, but that effect is dramatically 
reduced for digital materials. 


62 Manual of Digital Libraries 


The physicality of electronic materials is complicated 
by the fact that a digital library may exist on one or more 
mirror sites - thus the digital library exists in several physical 
locations. Perhaps more importantly, the servers on which 
the digital library exists may be located anywhere on Earth 
(or even off Earth if the so-called “Interplanetary Internet” 
should come to pass (www.ipnsig.org)). Although increased 
physical distance means increased latency, the functionality 
offered by the digital library may be the same whether the 
server is inthe room next door or on Mars. A more appropriate 
approach to the “space” occupied by electronic materials, 
therefore, is not to consider the physical space occupied by 
electronic materials, but the virtual “space” occupied by such 
materials. 


Information Architectures for Physical and Digital 
Libraries : There are two classification schemes that are 
commonly used to organize materials— the Library of 
Congress and the Dewey Decimal classification schemes. 
These classification schemes, and others that serve similar 
purposes, are overlays on the physical space of the library, 
as well as over the materials themselves. A different 
classification scheme may be overlain over the same physical 
space, without altering the physical space. Indeed, this 
happens whenever a library switches from one classification 
scheme to another, as many libraries have from Dewey 
Decimal to Library of Congress over the past several decades. 
This process is extraordinarily labour-intensive, particularly 
for a large collection, but it does not necessarily require any 
sort of renovation to the physical space of the library. Though 
it may be a renovation or some other large-scale change to 
_ the physical space that motivates the change of classification 
scheme, as in the currently ongoing reclassification from 
Dewey to Library of Congress in Duke University’s Perkins 
Library (www.lib.duke.edu/perkproj/) The physical space and 
the classification scheme utilized within that space are 


Digital Libraries — The Concept Elaborated 63 
logically separate. 


When a user comes into a physical library, one possible 
strategy that he could employ to find the materials that he is 
looking for is to search in a card or online catalogue, determine 
the call numbers for these materials, and then find those 
materials on the shelves, which are organized according to 
call number. The call number is the point of connection 
between the physical space and the classification scheme 
that overlays it— the call number orients both the physical 
materials and the user of those materials within the physical 
space, and to a limited extent unites the ideas manifested in 
the materials to the space of ideas manifested in the 
classification system. 


In a digital library, on the other hand, the “space” itself 
is defined by the classification scheme — though frequently 
this scheme is neither Library of Congress nor Dewey 
Decimal). When a user comes to a digital library, he is 
presented with an interface. Principles of information 
architecture suggest that this interface should reflect the 
classification scheme according to which the digital library 
collection is organized. If the digital library’s collection is 
reorganized according to a different classification scheme, 
then the interface itself should change to reflect that. While a 
physical space and a classification scheme are logically 
separate, a digital “space” and a classification scheme are of 
a piece— the classification scheme to a certain extent defines 
the digital space. When the classification scheme utilized in 
a digital space changes, the user’s interaction with that space 
changes. A digital library may even employ multiple 
classification schemes and allow each user to select the 
scheme he prefers. 


More importantly, whereas physical materials can only 
be placed in one location, digital materials can be ‘placed’ in 
multiple locations. Kwasnik writes that mutual exclusivity is a 


64 Manual of Digital Libraries 


requirement of traditional classification schemes— a given 
entity can belong to one and only one class. The requirement 
of mutual exclusivity in classification schemes may be an 
upshot of the fact that physical artifacts — such as books, can 
only exist in one place — such as on one shelf in a library. 
Kwasnik admits, however, that the principle of mutual 
exclusivity makes realistic representation of knowledge 
difficult — ideas do not lend themselves to being so narrowly 
pigeonholed. The flexibility of digital space to accommodate 
multiple classification schemes simultaneously allows a digital 
library to better accommodate the multiple ideas manifested 
in artifacts. Thus, a key difference between physical and digital 
libraries is that physical libraries typically provide only a few 
access points for materials that are then retrieved from 
specific physical locations, whereas digital libraries provide 
a multiplicity of indexes for access from potentially many 
locations. 


Rosenfeld and Morville suggest that information 
architecture is fundamentally an exercise in classification— 
the task of the information architect is to create a sensible 
and intuitive classification scheme according to which 
information will be organized and made available to users. It 
is this classification scheme that dictates both the organization 
of a website, and users’ interaction with it. Jakob Nielsen 
suggests that the structure of a website helps users to 
navigate within that site (www.useit.com/alertbox/ 
20000109.html). This is one of the underlying principles of 
information architecture— that classification schemes are 
important in presenting electronic data. The underlying 
metaphor of information architecture is obviously that of 
physical architecture— the orientational metaphor of 
“navigation” or “movement” through information is employed 
as analogous to moving through physical spaces. 


On the other hand, instead of navigating through menus 
and links to find materials, the user could use the digital 


Digital Libraries — The Concept Elaborated 65 


library’s search engine if one is provided. In this way, the 
user may retrieve a set of materials from the digital library’s 


collection matching certain criteria. This is of course also 
possible, but consideranly mors dificult in a physical library— 
retrieving a set of materials frorn a ohysical library's collection 
requires the user to engage in the = WO- -S20 process of first 
determining the call numbers oft ne materials to be retrieved, 

z£ to collect those 


and then of moving through the physica! soac 
materials manually. A search of a di libr 


gost i 
Zoi 


sr 
2 
4 
t 


ary, on the other 
ndali 


hand, may provide the user with a citation and a link directly 
to an information resource — as in a search engine, or with 
the actual resource itself — as in a full-text database. 


Digital libraries allow the same two methods for 
accessing materials as are allowed in a physical library - 
browsing and searching - but the user’s experience in 
employing these methods are entirely different in the physical 
and digital realms. First, browsing is an efficient method of 
accessing materials in a digital library, due to the integration 
of the classification scheme and the “space” in a digital library. 
On the other hand, in a physical library, the efficiency of 
browsing is in large part dependent on the materials being 
browsed-— it would be highly inefficient to attempt to find one 
specific book in a library by browsing, but browsing current 
serials is an extremely efficient means for scanning a large 
amount of information. Further, while searching may be the 
most efficient method of access in a physical library because 
it provides a point of connection between the classification 
scheme and the physical space, searching in a digital library 
bypasses both the classification scheme and the virtual space 
created by it. 


Let us now return to architecture, both physical and 
virtual. A physical library is a prisoner of its physicality— it is 
difficult to alter the architecture of a physical library after it is 
built and new construction must take place, or the existing 
construction must be retrofitted with new objects, such as 


66 Manual of Digital Libraries 


compact shelving. A digital library, on the other hand, may 
be reconfigured comparatively easily after it is built. New 
“spaces” may be added or removed or altered, and new 
pathways through the space may be added. Thus from the 
library’s point of view, digital libraries offer more flexibility and 
lower costs architecturally and from the user’s point of view, 
however, the persistence and stability of physical architecture 
helps frequent and casual users to navigate in a “known” 
space. Changes in physical or virtual spaces can be jarring 
and disruptive for people. This is a particular concern in 
today’s digital environment where people use relatively small 
screens to represent the entire underlying architecture. In 
effect, the entire virtual space of a digital library implodes in 
order to be represented on a desktop or PDA/cell phone 
display. 


In a physical library, special collections consist of 
subsets of the library’s entire collection, and the materials in 
special collections are generally classified in such a way as 
to differentiate them from the main collection and are stored 
in a space separate from the main collection. In a digital library, 
on the other hand, there may not be any need to separate 
out items in a special collection from items in the main 
collection, either logically or architecturally. No item in the 
collection is any more or less accessible than any other in a 
digital library. Whether the back-end of a digital library is a 
file structure or a database, all items contained therein may 
be referred to logically in the same way. This logical 
consistency allows for architectural flexibility— in effect, a 
digital library’s entire collection may be part of one or more 
special collections. 


Some digital libraries have, however, created special 
collections of sorts. At one extreme, ibiblio (www. ibiblio.org), 
a contributor-run digital library, provides access to more than 
1,500 different collections that are fully independent. These 
are partitions of the full digital library space defined by the 


Digital Libraries — The Concept Elaborated 67 


topical areas of the entire set of contributors. Other digital 
libraries provide specific partitions that are not mutually 
exclusive and may serve as exhibits. Perseus Digital Library, 
for example, maintains an “exhibit” on Hercules 
(www.perseus.tufts.edu/Herakles/) and another on the 
ancient Olympics (www.perseus.tufts.edu/Olympics/). At the 
time of this writing, the American Memory Project was 
featuring the Ansel Adams and African-American Sheet Music 
collections (memory.loc.gov/ammem/). It is possible for a 
digital library to create a special collection, simply by providing 
easy access to a select subset of the resources in the 
collection. Indeed, any number of such special collections 
may be created by “slicing” the entire collection in a variety of 
ways- Ibiblio, Perseus, and American Memory have created 
special collections by creating architecturally, if not logically 
separate collections. It is also comparatively easy to change 
these special collections over time, in a way that it is not easy 
to do ina physical library. These exhibits or special collections 
may also be maintained indefinitely since the virtual materials 
can exist in multiple locations and thus are infinitely reusable, 

without the costs of physical space. Creating or changing a 

special collection in a physical library requires the physical 

movement of materials; creating or changing a special 

collection in a digital library requires the alteration or replication 

of the links on webpages. 


A physical library is a prisoner of its physicality, but a 
digital library is prisoner of its technology. Physical space is 
clearly more important in physical than in digital libraries, but 
digital libraries are more dependent on energy than are 
physical libraries. Without electricity, a modern physical library 
would give up some services but could maintain basic 
operations. A digital library is useless without the energy 
needed to transfer and display information. This tradeoff in 
crucial requirements in the physical and digital libraries reflects 
the nature of the matter or energy manifestations of ideas 


68 Manual of Digital Libraries 


contained in the respective materials. Thus, matter dominates 
physical libraries and energy dominates digital libraries and 
these differences sit along the physical-conceptual continuum 
of libraries as place. 


Thus, it seems that the requirements for and uses of 
physical space are dramatically different for physical and 
digital libraries. Physical libraries are bound by a more or less 
static physical architecture that dictates now physical 
materials may be accessed, whereas digital libraries can 
leverage the spatial-temporal characteristics of digital 
materials to provide a variety of architectures that provide 
more control over collections but require energy to deliver 
this control. Physical libraries mainly depend on classification 
systems for locating materials in physical space, whereas 
digital libraries depend on classification systems for locating 
materials in conceptual space, allowing an item to have many 
Classification ‘call numbers’ within a single classification 
system, and allowing items to be found through a variety of 
different classification systems. Digital libraries are dependent 
on electricity for representation and are thus better suited to 
materials at the conceptual end of the continuum of library as 
place. 


Collection Development and Preservation : Two of the 
most fundamental issues faced by a library are storage and 
maintenance over the time. Thus, decisions must be made 
on what is the best use of available storage space and how 
the materials in the space can be preserved over time. These 
issues require two sets of decisions— what materials will be 
brought into the space, and what materials will be stored over 
the long term or removed from the space. In many respects, 
the issues involved in collection development are the same 
for physical and digital libraries, with respect to materials 
serving the library mission, the appraisal process, and costs 
associated with acquisition or licensing. One important 

distinction between collection development in physical and 


Digital Libraries — The Concept Elaborated 69 


digital libraries, however, is that the physical library is 
constrained by physical space for the materials selected, 
whereas the digital library is more constrained by the materials 
available. The maintenance and preservation functions also 
differ significantly between physical and digital libraries. 


Storage Space : The decision concerning what 
materials will be brought into the storage space is known as 
collection development. The collection development process 
is fundamentally the same for both physical and digital 
materials— identifying candidate materials pertinent to the 
library mission and budget, assessing the quality of the ideas 
represented in the materials, and storing what can be stored 
given the available space. The costs involved in collection 
development are likewise the same for both physical and 
digital materials— there are costs involved in assessing and 
collecting materials, maintaining materials over time, and 
providing access to the materials to users. 


The form of the materials is less critical than the quality 
of the ideas represented in those materials, and librarians 
and other stakeholders will continue to invest considerable 
effort in assessing the value of the materials they include in 
their libraries. These efforts fall almost totally to librarians to 
select and acquire materials in a digital library. In digital 
libraries, the nature of digital materials allows users and other 
stakeholders to participate in selection and acquisition directly. 
How user contributions may fit into the overall quality 
assurance models that libraries have developed over the 
years, however, is stil! evolving. 


There are two ways in which electronic materials may 
come into existence: an object may be “digitized” - that is, an 
electronic representation may be made of a physical object, 
by scanning it, digitally photographing it, or some other 
representation method - or an object may be “born digital” - 
that is, created electronically and not in a physical format. 


70 Manual of Digital Libraries 


Digitization, however, is expensive - so much so, in fact that 
many organizations currently award grants to support 
digitization efforts, as many institutions cannot afford to 
undertake such an effort without financial support. For 
materials that are unique, rare, or fragile, however, digitization 
may be the best way to insure that these materials or at least 
alternative representations of the ideas manifested in these 
materials, are available to users, or even available at all in 
the long term. Born-digital materials, on the other hand, may 
be considerably less expensive to produce - it of course costs 
nothing to digitize these materials, and the cost to create 
certain types of materials digitally may be less than the cost 
to create comparable materials physically. Born-digital 
materials are, further, neither rare nor unique, since the cost 
of copying digital materials is so low as to be practically 
nonexistent. Both the human labour of creating and 
cataloguing, and the intellectual property costs, however, 
remain significant for both physical and digital materials. 


Thus, physical libraries must consider storage space 
limits in collection development, which in turn affects the user 
experience in the physical library as place, whereas digital 
libraries are less constrained by storage space but are 
constrained instead by energy sources that likewise affect 
the user experience in the digital library as place. Most 
importantly, both physical and digital libraries must appraise 
the value of works they wish to add to the collection, 
regardless of its representation form. Likewise, they must 
consider the costs of acquisition and/or licensing in the 
collection, and the issue of maintaining and preserving the 
collection. 


Preservation : Once an object is in a digital format the 
costs involved in maintaining it over time are the same, 
whether it is a digital representation of a physical object or 
born digital. There are several hidden costs involved in 
maintaining digital objects over time. First, computer hardware 


Digital Libraries — The Concept Elaborated 71 


has a finite lifespan— the disk array on which a digital object 
is stored will eventually fail, or the server in which that hard 
disk is installed will eventually need to be upgraded. In these 
situations, the digital object will need to be migrated to another 
piece of hardware. Likewise, librarians or system 
administrators who manage the hardware must regularly 
attend to software upgrades. A second cost of the 
maintenance of digital objects is changes in formats Over 
time, formats for files change - new applications come to be 
used, new versions are released and functionality changes, 
and eventually applications cease to be used. This change in 
applications and functionality requires that over time, digital 
objects he “ported” to new formats that are compatible with 
new applications. In response to this issue, Adobe Systems 
is currently attempting to position its Acrobat software as an 
archival standard for text documents, so that both the data 
within a document and the form of the document may be 
maintained in the long term. This is an attractive solution in 
many ways, but limited to documents and digital objects that 
emulate the format of paper, and perhaps nota viable solution 
for digital objects in other formats, such as video or music or 
data sets. 


While physical objects of course also need to be 
maintained over time, the time spans on which they require 
maintenance are generally much longer than digital objects. 
The usable lifespan of digital media varies with the medium, 
perhaps as little as one to two years for magnetic tape, five 
to ten years for magnetic disks, and thirty or more for optical 
disks, though estimates vary but the usable lifespan of 
physical objects tends to be far greater. Many libraries own 
books that are over one hundred years old and in perfectly 
readable condition; it is unlikely that any currently existing 
digital format will be so readily usable one hundred years 
from today. Other formats of physical objects have even 
greater life spans— sculpture, for example, properly 


72 Manual of Digital Libraries 


maintained, may last for hundreds or even thousands of years. 
Thus, the quality of the storage space also matters (e.g., 
temperature, humidity, etc.), and archivists are only beginning 
to have enough experience with digital storage media to 
determine environmental requirements. The current status 
of digital preservation is to plan for regular migrations with 
associated costs. 


Whether an object is a physical object, a digital 
representation of a physical object, or born digital, one issue 
that is consistent is the decision whether to store the object 
over the long term or to remove it from storage— in archives 
this is the appraisal process, and in libraries this is the weeding 
process. This archival function is necessary in physical 
libraries to insure that only the most valuable materials are 
maintained in the collection over the long term. Value may 
be defined in any number of ways, such as those materials 
that are worth the most in monetary terms, the rarest or most 
unique items, or those items that receive the most use. The 
process by which this appraisal takes place is summed up 
neatly by Eastwood et al. as “study[ing] the records 
themselves and determin{ing] the various elements of them 
that are likely to give them continuing value, for example, 
their usefulness for legal purposes, their value as evidence 
of the functioning and organization of their creator, or their 
potential for research’. 


The larger the physical space a library occupies, of 
course, the more material can be maintained in the physical 
library's collection. Similarly, the greater the volume of the 
disk arrays on which a digital library resides the more material 
can be maintained in the digital library’s collection. In physical 
and virtual spaces alike, however, the larger the space and 
the more material contained in it, the greater the need for the 
space to be well organized and maintained, and the greater 
the demand on librarian time to husband the materials and 


space. 


Digital Libraries — The Concept Elaborated 73 


Thus, both physical and digital libraries must consider 
how best to preserve materials and the costs of doing so when 
making collection development decisions. Librarians can learn 
from archivists about what is needed to preserve physical 
materials; librarians with digital assets are challenged to 
discover how to preserve digital materials as well as what 
the costs will be. The best solution is regular conversion and 
upgrades, the costs of which can be only estimated rather 
than accurately budgeted. 


2.3.2. Spaces for Stakeholders 


Spaces are the most important constituents of libraries; 
they are discussed below : 


The Physical Library as Place : Libraries have long 
served as important places for people to work, think, and 
collaborate. Many academic libraries are literally cathedrals 
of learning with impressive architectural features. In essence, 
library spaces are used as much to inspire ideas and feelings 
as they are to serve utilitarian functions. However, some new 
library places are beginning to directly address new library 
space models where the library is really a hybrid physical 
and digital place. For example, in describing the new health 
sciences library building at the University of Maryland, Weise 
writes, “we have done our best to provide [users] with services 
so they would not have to come to the library”. Likewise, 
projects are underway at the University of British Columbia 
and the California State University at Northridge to replace 
the existing academic library with a learning center where 
human-oriented workspaces are in one wing and the physical 
materials are stored in another wing, in temperature-controlled 
compact storage attended by Automated Storage and 


Retrieval Systems (www.library.ubc.ca/home/asrs). 


Two related social forces that have generated 
considerable thinking about libraries as places are the 


74 Manual of Digital Libraries 


popularization of the internet, and the appearance of large 
chain bookstores. Interestingly, these two forces emerged in 
close temporal proximity in the mid-1990s. As anyone who 
has been to a shopping mall in the past decade knows, these 
bookstores contain comfortable chairs in lounge-like seating 
areas, childrens’ play spaces, and coffee bars. Essentially, 
these bookstores attempt to fulfill the role of what Oldenburg 
refers to as “third places.” The first two places are the home 
and the workplace, while the third place is an exclusively social 
place: the town commons, the street corner, or the local pub, 
for example. Third places are “public places that host the 
regular, voluntary, informal, and happily anticipated 
gatherings of individuals”. As Coffman points out, these 
bookstores deliberately attempt to create third place-style 
environments by offering inviting surroundings and a schedule 
of events. 


The advent of such “third place” bookstores led Coffman 

to pose the question, “what if you ran your library like a 

bookstore?” The Multnomah County Library in Oregon was 

one of the first libraries to offer a coffee bar, partnering with 

Starbucks Coffee in 1997. Since then, many other libraries - 

both public and academic - have added coffee bars. Coffee 

bars are, however, simply one instantiation of a larger trend 

in libraries to create spates that will be appealing to users 

and can serve as social spaces. There is a particular concern 

to appeal to undergraduates who, with the advent of the World 

Wide Web. some believe to be using the library less and less. 

Carlson’s article entitled “The Deserted Library” portrayed a 

situation in which the increasing use of electronic materials 

by students meant a decreasing use of materials and services 

within the library building. This article caused a certain degree 
of panic in the academic library world, even, as Albanese 
reported, causing academic library directors to resign. This 
trend may also have influenced decisions in the late 1990s 
and early 2000s to spend money on academic libraries, both 


Digital Libraries — The Concept Elaborated 75 


renovations to existing buildings and construction of new 
library buildings 


This trend also influenced the development of the 
LibQUAL+ survey instrument (www.libqual.org) to evaluate 
library service quality. LilbQUAL+ assesses quality along four 
dimensions, one of which is how the user feels about the 
library as a place. Along this dimension, LibQUAL+ includes 
questions concerning the user’s feelings about the library as: 


e A space that facilitates quiet study, 

° A haven for quiet and solitude, 

e A place for reflection and creativity, 

° A comfortable and inviting location, and 
e A contemplative environment. 


All of these questions are concerned with the ambience 
of the space, and the user's feelings about the space. But 
what if the user finds coffee bars to be comfortable and 
inviting, or the park to be a place for reflection and creativity? 
What if the local coffee shop is more convenient for the user 
to get to? The physicality of libraries, however inviting, is by 
nature place-bound-— the user has to physically travel to the 
library to partake of the environment. Given a choice between 
different physical places in which similar tasks may be 
accomplished - e.g., a library or a bockstore in which to get 
access to printed materials, or the local coffee shop as a place 
to sit and work - is it any wonder that users choose the more 
inviting physical environment? As the functionality of libraries 
increasingly comes to be available virtually, is it any wonder 
that users are not coming to the physical library, and are 
instead making use of this functionality in other, more inviting 
Or more convenient physical environments? 


Electronic Environment as Place : Alongside this 
movement in libraries to create more inviting physical 


76 Manual of Digital Libraries 


environments, was a movement to create more “physical- 
like” digital environments. This period of time in the mid- to 
late-1990s saw the rise of information architecture as a 
profession and a field of study. 


While of course there is no such thing as a coffee bar 
online, there certainly are more or less inviting virtual 
environments. Kalay and Marx suggest that the natural and 
logical way to design an inviting virtual environment is to base 
it on physical space, that “designing places in Cyberspace 
can, indeed must, be informed by the principles that have 
been guiding physical place-making tor centuries”. In an 
extreme case of virtual design being based in physical design, 
Frischer describes the CVRLab at the University of California 
at Los Angeles, which creates “scientifically authenticated, 
3-D computer models of the worlds cultural-heritage sites” 
such as the Roman Forum, the House of Augusius, and the 
Colosseum in Rome— physical places, though ones for which 
it may no longer be possible to know precisely the details of 
the physical design. Not all virtual design is so literally based 
in physical design, however; some is more metaphorical. 
Basing the digital experience of place on the physical provides 
the user with a familiar environment, and thus creates fewer 
barriers for use. Information architecture and web design best’ 
practices are consistent with this approach. Vincent Flanders, 
on his entertainingly website (webpagesthatsuck.com) and 
in his book of the same name lampoons websites with poor 
design, and points out what elements contribute to this poor 
design, as a tongue-in-cheek method of teaching good web 
design. Flanders has gripes with many elements of web 
design, but fundamentally his message is a simple one— the 
function of web design is to deliver information to the user as 
efficiently as possible, without unnecessary design elements 

to distract from the information itself. Jakob Nielsen delivers 
a similar message on his website (useit.com) and in his book— 
good design creates a positive experience for website users. 


Digital Libraries — The Concept Elaborated 77 


Virtual environments may be more or less “comfortable” due 
to their design. People treat their experience of “moving 
through” virtual spaces as analogous to moving through 
physical spaces; a well designed virtual space therefore 
makes for a more inviting space for users. 


In a digital library, this philosophy of design may lead 
to a focus on what, in a physical library, is referred to as 
“signage”: the labeling of areas where materials or services 
are located so that the user may easily find them. Different 
digital libraries take different approaches to signage. In a 
digital space, the classification scheme is equivalent to the 
organization of the space. Further, the subset of the 
classification scheme that is available to the user is equivalent 
to signage. 


Another distinction between physical and electronic 
environments as spaces is that of the accessibility of the 
materials in those sports. In a physical library, the user can 
gainaccess to the collection only by physically coming to the 
library. In a digital library, the user can gain access to the 
collection electronically from anywhere without physically 
going to the library. An interesting situation exists for physical 
libraries that maintain electronic components, however— 
electronic materials are not accessible physically, and 
physical materials are not accessible electronically. Thus, the 
physical library has two mutually inaccessible collections, the 
physical and the digital. In many cases, physical libraries do 
have physical and digital versions of the same work, it is 
suggested, however, that the very format of representation 
supports different types of value-added services that make 
users’ experiences with the same intellectual work quite 
different. It is here where the issues raised by the environment 
of the library as place diverge most strongly— if the user is 
using digital materials, whether the user is in a physical library 
or at home or work, the experience will be more similar than 


78 Manual of Digital Libraries 


if the user is using physical materials for the same work in 
the library or at home. In essence, the user’s experience of 
the library as place is more dependent on the form of the 
material and what additional services the library provides than 
it is on whether the user’s body is in a physical library or 
elsewhere using a digital library. 


While physical and digital collections are mutually 
exclusive, the physical and digital spaces “inhabited” by users 
of these collections are not. Rather, these spaces overlap, or 
are superimposed upon one another. A user may be present 
in both a physical and a digital library simultaneously, or may 
be present in some third place and making use of both digital 
resources and physical services. 


So, a continuum of environments exist, ranging from 
entirely physical to entirely digital, with a range of hybrids. 
Spaces at the physical end of this continuum have the 
capacity to strongly affect the user’s experience. Spaces at 
the digital end of this continuum, on the other hand, are 
impoverished by comparison. Many online “spaces” are 
impressive feats of technical expertise and data visualization, 
for example, the Atlas of Cyberspaces 
(www.cybergeography.org/atlas/) for some of the most 
cutting-edge of these spaces, but many, if not most, are 
indistinguishable from places of work or play because the 
same physical devices and interaction styles are required by 
the constraints of the technology. Consequently, physical 
libraries may be awesome and inspiring, while current digital 
libraries are impoverished spaces. These awesome and 
inspiring spaces, however, may be highly constraining, 
requiring considerable human efforts to access the ideas 
contained in the space, while the impoverished digital space 
may enable more direct access to those ideas. Thus, the 
classic architectural tension between form and function is 
quite vivid in libraries, both physical and digital. 


Digital Libraries — The Concept Elaborated 79 


Individual Spaces : These awesome and inspiring 
spaces are also by nature shared spaces— most libraries are 
designed to serve multiple user populations and multiple 
users. The physical library is a space where individuals come 
together. Though it is possible for an individual to use the 
library herself, there are always going to be others in the 
space— other users or librarians. The only libraries that are 
designed for only one user are personal libraries, libraries 
collected by a single individual for her own personal use. Such 
libraries are not shared spaces; instead they are individual 
space, often in one’s office or home- that is, in one’s personal 
space, the space that one customizes to suit oneself. 


Such individual spaces are beginning to emerge in the 
digital arena. Systems to assist with collection development 
for such personalized spaces are nascent but developing, as 
some work has been done already on personalized agents 
for information retrieval and filtering. Despite the fact that the 
idea of personalized information spaces dates back at least 
to the Memex there has to date been little work done on 
systems for personalized organization of the information 
collected by a single individual. Experimental systems to 
perform this function are only recently beginning to emerge, 
and take two distinct approaches-— the first is to allow the 
user to search their own files using a search engine, for 
example— Google Desktop, desktop.google.com, and Beagle, 
beagle-project.org, and the second is to perform what 
Microsoft refers to as an implicit query”: the user’s files are 
searched automatically, and data retrieved context- 
sensitively. As the technology for wearable computing, and 
personal area networks improve and become more 
widespread, it will be possible for more such individual 
collections to be created that include traces of one’s personal 
experiences as integral parts of the collection - with significant 
implications for future searches and collection use. 


80 Manual of Digital Libraries 


Borgman suggests that the next research front in the 
digital library’ arena is the personal digital library, and Beagrie 
discusses several research and development fronts related 
to persona! information capture and management. It is 
unclear, however, whether such personal information spaces 
may reasonably be referred to as libraries. Certainly, like 
traditional libraries, a personal information space is intended 
for a specific user population: indeed, a very specific user 
population of one. And like traditional libraries, a personal 
information space provides architecture for the collection and 
organization of materials and functionality for the use and 
sharing of these materials. But this is not sufficient to 
distinguish a personal information space from many other 
spaces, both physical and digital. What personal information 
spaces to date lack, that traditional libraries possess, are 
human-intermediated services, that is, librarians. Of course, 
early digital libraries also lacked human-intermediated 
services, and many still do, though such services have started 
to emerge in some large digital libraries. So there is reason 
to hope that the functionality of human intermediation rnay 
yet be implemented in personal information spaces, though 
whether this will entail the specific user providing some form 
of mediation to herself displaced in space and/or time, or 
automating currently manual services, or development of 
software agents, or some other solution, remains to be seen. 
Whatis clear, however, is that digital libraries have potentially 
much greater latitude in integrating and leveraging personal 
spaces than do physical libraries. 


Spaces for Work : Electronic spaces (e.g., digital 
libraries) allow people to engage with a certain set of 
functionalities (e.g., library services) in their own preferred 
physical space. If an individual prefers coffee shops or the 
park as spaces to work or study or reflect, then the individual 
may be less likely to use a physical library. In such a Case, it 
is necessary to bring that set of functionalities to the individual. 


Digital Libraries — The Concept Elaborated 81 


Marchionini states that “as IT becomes more pervasive, 
the boundaries between physical and cyber spaces and 
between public and private spaces grow less distinct and more 
permeable.” This blurring of boundaries is most obvious in 
advances in what is termed ‘augmented reality’ where the 
physical world is literally overlaid with digital added values. 
Blueprints on eyeglasses for maintenance workers and 
medical imaging displays projected on surgery patients are 
popularized examples, while Back et al. demonstrated 
children’s books augmented with sound effects that take 
commercial learning systems such as LeapPad a step further. 
In a similar vein, Huang discusses what he calls “convergent 
architecture.” Huang, an architect of physical buildings, is 
somewhat more conservative when he states that ‘there is 
little or no interaction or coordination between the activities 
performed virtually and those performed physically... What 
this comes down to is a failure to fulfill one of the central tents 
of architecture: aligning the structure, or form, of a space with 
its use or function”. 


A library is fundamentally a “pull” technology, where 
the user must go out of her way to deliberately retrieve 
materials. This is the case in both physical and digital libraries— 
a user must go to a physical library and must seek materials 
out and for certain types of materials must literally pull them 
off of shelves Similarly, a user must go to a digital library and 
must use a technology more commonly thought of as a pull 
technology — HTTP — to retrieve webpages. “Going” to the 
digital library, however, takes substantially less time and effort 
than going to the physical library since the going is virtual. 
Moreover, it is feasible that digital library materials may be 
‘pushed’ to users either through subscription- or advertising- 
based alerting services such as RSS or Rich Summary Sites. 


One important difference between the physical and 
digital library visit is the exclusivity of the physical visit. If one 
visits a physical place, one cannot simultaneously be in 


82 Manual of Digital Libraries 


another place. This is not the case in digital places where 
individuals can simultaneously work in multiple spaces, either 
actively, for example through the use of multiple windows, or 
automatically, for example through automated processes and 
daemons. In fact, in the extreme case of identity theft, one 
could be “present” in a number of places at one time 
unknowingly executing actions that have significant meaning 
and impact on one’s life. This sort of presence management 
is critical in the physical world, as one’s preferences for 
interacting with information - indeed, one’s very ability to 
interact with information - changes as one moves through 
space— one may, for example, prefer to communicate by 
telephone and to interact with information on the web in the 
office, but to communicate face-to-face and interact with paper 
at home. Of course, in one’s interactions with physical objects 
in the physical world this is not called presence management- 
it is simply presence. 


An increasing number of physical libraries are 
maintaining digital collections alongside their print collections, 
and as this trend continues the experience of using a physical 
library will become more fully integrated functionally with 
digital materials, and in time hopefully architecturally as well. 
As digital libraries proliferate and both portable computing 
and wireless networking improve, the experience for a user 
of using a digital library will become more fully integrated with 
her daily activities in the physical world. Brunk aptly suggests, 
as technology advances and as physical and digital library 
functionalities merge, “it will become harder to distinguish 
between the physical world and the online world.” 


It can be illustrated with a self-centered example. The 
offices are physical spaces in which we do work. Physically, 
these spaces are fairly small. For the purposes of our work, 
however, these spaces are almost unboundedly large, when 
viewed from the perspective of our ability to access electronic 


Digital Libraries — The Concept Elaborated 83 


materials from within these spaces. Indeed, the sheer quantity 
of data that is available - on the free internet and in databases, 
available to us courtesy of the university's campus network - 
as we go about our work is greater than the physical contents 
of all of the physical libraries on campus, and perhaps greater 
than the physical contents of all of the physical libraries in the 
world. In fact, it is not our offices that are important in allowing 
us to access this information, but our computers. The 
approximately one cubic meter of physical space - the space 
bounded by eyes, hands, keyboard, and screen - is 
unboundedly large, in our ability to access information. 
Indeed, we may own laptops, so this physical space is 
portable, which is of course the whole raison d'etre of laptops— 
this cubic meter of physical space may move throughout 
physical space, but thanks to the existence of a proxy server 
on the campus network, our ability to access electronic 
materials from this space is unchanged- at least when that 
physical space overlaps with a physical space containing an 
accessible wired or wireless network connection. 


William Mitchell in his book City of Bits claimed that 
this integration of technology and networking into individuals’ 
daily lives and activities was increasingly converting people 
into “Cyborg Citizens.” As ominous as this perhaps sounds, 
Mitchell views it as natural and desirable— in the way that 
eyeglasses improve vision, so other technologies improve 
other human senses and functions. He goes on to discuss 
the environments that are created, even necessitated, by 
these Cyborg Citizens, both in their personal and work lives— 
both entirely virtual “spaces” and integrated physical and 
virtual spaces, reminiscent in fact of Huang’s “convergent 
architecture.” Mitchell argues that these are spaces in which 
individuals and resources intersect— for example, employees 
intersect with their work as it is either pushed to or pulled by 
them and functionality is brought to the individual. In this vein, 
libraries and other information services are beginning to offer 


84 Manual of Digital Libraries 


push functionality to users through RSS feeds that continually 
broadcast information to subscribers. 


These capabilities both support individual use of the 
digital library and lead to new kinds of spaces for collaboration. 


Spaces for Collaboration : Electronic spaces bring 
functionality not only to the individual, but to groups. As 
Mitchell points out, telecommuting enables, both the virtual 
“cottage industry” and the virtual corporate office. The 
difference between these work environments is simply one 
of scale— how many individuals are collaborating, performing 
and sharing work. Marchionini gives this virtual space the 
name sharium, “a workspace with rich content and powerful 
tools where people can work independently or collaborate 
with others to learn and to solve their information problems”. 
The sharium is an extension of the digital library, 
encompassing functionality that will enable users to utilize 
the sharium “as a problem solving space where individual or 
collaborative investigation and construction of new know ledge 
takes place”. The sharium provides the tools and virtual 
spaces to which people may bring their own materials as well 
as use the library’s materials. People working in this space 
may work alone, but may gain enormously by working with 
others to create new ideas and material manifestations of 
those ideas. Moreover, people may contribute those materials 
back to the library and the larger community. In the sharium, 
people also contribute time and expertise as members of 
communities of interest physically distributed around the 
globe. In the sharium, shared wikis, smartboards, and 
teleconferencing equipment; personal blogs; ephemeral chat 
rooms; and archival forums serve as analogs to the meeting 
rooms, bulletin boards, and public kiosks in physical libraries. 
The sharium is not bounded by physical space and allows 
extensive contributions by stakeholders. These contributions 
serve as a potentially valuable set of ideas and materials that 
may eventually be evaluated by the community either 


Digital Libraries — The Concept Elaborated 85 


informally or formally. Open-source software communities are 
among the highest-profile examples of the contribution feature 
of the sharium, but other examples include the Wikipedia 
(en.wikipedia.org), a contributor-constructed encyclopedia 
containing millions of articles; and the Open Directory 
(www.dmos.org), a directory of the Web in which contributors 
classify websites by topic. These emerging contributor- 
created information resources represent important 
alternatives to the carefully curated collections produced by 
professional librarians in physical and digital libraries. 


Kranich refers to shared information spaces as 
information commons: “a resource, or a facility, that is shared 
by a community of producers or consumers”. Information 
commons are “areas” in the public domain that are maintained 
as public - as, for example, parks and water resources are 
public - for their value in “promoting democracy and the free 
flow of ideas”. The information commons is thus an alternative 
to the “enclosure” of intellectual property and information 
resources. The sharium is a space for collaboration, but the 
information commons politicizes that collaboration. Kranich 
suggests that public spaces and the public interest go hand 
in hand in a democratic society. Information, like parks, 
Kranich suggests, is a resource that is best and most useful 
when owned in common by all potential users, and private 
ownership of such resources is detrimental to civil society. 
The information commons, as a resource that is not 
diminished by use, which is in fact enhanced by contributions 
from users, is thus an important alternative to the more 
common copyrighting and patenting of intellectual property. 


The term “information commons’ actually has two more 
or less distinct meanings—the first, being roughly equivalent 
to “sharium.” The second meaning is roughly equivalent to 
“collaboratory.” An information commons, according to this 
definition of the term, is a space designed to enable shared 
use of computing resources. The Symonds Labs at Rice 


86 Manual of Digital Libraries 


University (www.symonds.rice.edu) and the Virtual Village at 
the Public Library of Charlotte & Mecklenburg County 
(plemc.org/libLoc/mainVirtualVillage. htm) are outstanding 
examples of such spaces. Information commons of these 
types may take any number of forms — computer labs with 
seating arranged so that multiple users can share access to 
one computer, seating arranged in booths, group study rooms, 
wired classrooms, lounge-like spaces with wireless network 
access, and probably as many other configurations as teams 
of librarians and architects could envision. 


Information commons are spaces deliberately designed 
to bring together the three components of which information 
science is composed: information, technology, and people. 
Information commons gua sharia are intellectual spaces to 
which people come to use shared information resources, 
made available through the use of technology. Information 
commons qua collaboratories are physical spaces to which 
people come to use shared technological resources, as a 
means of accessing and utilizing information resources. Both 
types of information commons are shared spaces, and both 
are efforts to foster social use and construction of technology 
and information. 


Spaces for Education : The idea of social construction 
of information is not a new one. This idea is instantiated in 
the pedagogical style called cooperative or collaborative 
learning in the field of education. These terms refer to 
pedagogical styles in which students work together in groups 
toward common educational goals. These approaches are 
rooted in the social learning theory of Vygotsky whose ideas 
about zones of proximal development emphasize how crucial 
physical interaction is to learning. Libraries have always to 
some extent been environments that supported this sort of 
collaborative learning, but in recent years new library building 
projects have been deliberately positioning the library in this 
role. In fact, physical libraries have created information 


Digital Libraries — The Concept Elaborated 87 


commons to invite students and faculty to a physical space 
devoted to shared conceptual spaces of ideas and social 
interaction. Academic libraries in particular have positioned 
themselves in this role, but increasingly public libraries are 
doing so as well. Academic libraries have always contained 
spaces for collaboration— work tables, study carrels, small 
conference rooms, and the like. Nowadays, more and more 
such spaces are being designed into public libraries. 


There has been a burst of literature in recent years 
describing renovation and expansion projects in academic 
libraries. Freeman suggests that an academic library should 
address the “psychosocial aspects of an academic 
community”. The library contributes to students’ educational 
experience by providing a common space that “cut[s] across 
all disciplines and functions”. In this way, the library is 
conceived as a place that by its very nature supports the 
academic mission of the institution. The nature of this type of 
library is as an information commons, in both the physical 
and intellectual senses of the term—a shared space containing 
information resources. Miller suggests that simply being an 
information commons is insufficient, however, the academic 
library must also be “staffed with professionals who can help 

people with their intellectual inquiries’. 


On the one hand, this approach to library construction 
reifies the academic library’s place as the intellectual center 
of the educational institution. On the other hand, this approach 
emphasizes how many of the library’s traditional functions 
can be accomplished electronically. In order to maintain the 
library’s centrality to the educational institution, when much 
of the library's resources can be used electronically, academic 
libraries must position themselves not es spaces for materials 
but as spaces for people. Indeed, the resources in the library 
may he as much for show props to create an environment 
that users find inviting, as for actual use, Freeman suggests 
that “a significant majority of students still considers the 


88 Manual of Digital Libraries 


traditional reacting room their favorite area of the library-the 
great, vaulted, light-filled space, whose wails are lined with 
books they may never pull off the shelf”. In short, academic 
libraries are positioning themselves as community centers— 
as places for users to “hang out” individually or in groups, for 
work or social purposes. Essentially libraries are positioning 
themselves as third places with network access and resident 
librarians. 


Collaborative Services : One of the most fundamentally 
collaborative functions of libraries is reference work. 
Reference work has traditionally also been place-bound, 
performed at the reference desk. The cornerstone of desk- 
based reference work is the interview - even if research 
indicates that a reference interview is conducted only 
approximately half the time. The reference interview is 
stereotypically a one-to-one conversation between the 
librarian and the information seeker, in which the librarian 
attempts to elicit the seeker’s information need, in the face of 
the seeker’s quandary of asking a question on a topic about 
which she may knowlittle or nothing. The interview is viewed 
as central to this interaction, as the collaboration between 
the librarian and the information seeker allows the librarian 
to assist the user to focus her question and allows the librarian 
to provide appropriate resources. 


One aspect of physical place that digital reference 
currently cannot emulate is the quality of being face to face— 
transactions conducted v/a these forms of media lack the 
richness of a face-to-face conversation. The forms of non- 
verbal communication that a librarian relies on in the reference 
transaction - from the user’s body language to the librarian’s 
ability to size up a user based on appearance and manner- 
are lost. This is similarly true in other functions of the digital 
library; computer-mediated communication is less rich a 
communication medium than face-to-face communication. 
Coffman states that for this reason “it is hoped that chat is an 


Digital Libraries — The Concept Elaborated 89 


interim technology which will soon give way to something 
much more humane like voice.” It is not clear that Voice Over 
IP (VoIP) is likely to take off as a medium for reference work, 
nor is VoIP as rich a medium as face-to-face, but 
communication via VoIP is functionally similar to 
communication via telephone, and the telephone is a well- 
accepted technology for collaboration in a variety of settings. 
What is more, voice-based reference services appear to be 
staging a comeback in some more cutting-edge libraries. 
Videoconferencing technology has also improved and 
become affordable enough that future reference services may 
utilize it to return to the preferred model of face-to-face 
communication, though there is some evidence that the 
various forms of non-verbal communication may not be critical 
to collaboration. Videoconferencing technology may yet have 
a future in digital libraries to similarly support collaboration, 
though the appropriate uses of such technology is a topic in 
need of future research Further on the horizon is the possibility 
of virtual reality services in which people project themselves 
into virtual spaces “staffed’ by avatars of real librarians or 
software agents, “cyberprofessionals”. 


Pomerantz discusses issues involved in integrating 
digital reference service into digital libraries. One of the unique 
aspects of reference work in this environment is that reference 
librarians become collection developers, as their answers to 
reference questions may be stored in the digital library, along 
with any materials these answers point to. Thus, over time, 
materials “accrete” in the digital library due to the fact that 
they are part of or pointed to by reference transactions. 
Ackerman and Malone refer to one such system as an Answer 
Garden. This form of collection development runs counter to 
the manner in which collection development has traditionally 
been performed- instead of a staff of collection development 
librarians collecting only those materials for the library that 
have been deliberately vetted, collection development 


90 Manual of Digital Libraries 


becomes a collaborative process between reference librarians 
and users. In this form of collection development, librarians 
provide users with materials of links to materials in their effort 
to fulfill the users’ information needs, indeed, the environment 
in which the reference transaction occurs thus becomes a 
sharium— a collaborative environment in which participants 
solve an information problem, and in which the artifacts of 
that collaboration are stored. 


Another unique aspect of reference work in the digital 
environment is that reference transactions become 
annotations to any materials that are provided to the user by 
the librarian. A reference transaction may contain URLs or 
some other form of “pointer” to particular information 
resources. Over time, as reference transactions are archived, 
these pointers may be mined to create a “profile” for each 
information resource in the collection. This profile may serve 
to identify what types of questions, or what information needs 
particular information resources may be used to answer. It is 
mentioned above that one possibility for human intermediation 
in personal information spaces is for the user to provide some 
form of mediation to herself; this type of information resource 
profile development could fulfill this function. This profile 
development could also be useful, as Bollen and Luce 
suggest, in identifying the preferences of entire user 
communities, and in the creation of special collections on 
particular topics or for particular uses and user communities. 


So it can be concluded that for many users of a physical 
library’s resources and services, the library’s digital 
component is what they interact with most, or even 
exclusively. As Carlson points out, library users’ increasing 
use of electronic materials often means a decreasing use of 
materials within the physical building. This trend will likely be 
exacerbated by efforts such as the Google Books Library 
Project. Thus, as Taylor points out, physically going to the 
library is often the last resort for users. As the internet 


Digital Libraries — The Concept Elaborated 91 


becomes more and more a part of people’s everyday life, 
however, and people increasingly turn to online sources for 
their information seeking tasks, it is possible that digital 
libraries will be preferred by many to physical libraries. It is 
- not, one suspects, that people do not go to the physical library 
because they object to the library per se, but rather simply 
that it is less convenient in certain cases than using online 
sources. Digital libraries are a possible solution to this 
problem. This creates the dual role for physical libraries of 
providing access to specialized materials not available online, 
and as social spaces. 


As more digital libraries are built, and as more physical 
libraries offer electronic access to parts of their collection, 
two trends are likely to result— 


e The role of the library as a storage space for materials 
will become decreasingly important. 


e The role of the library as a space tor users, for individual 
and collaborative work, and as a space for social 
activity. will become increasingly important. 


Some non-digital materials may never have digital 
representations made of them, and some non-digital materials 
will always need to be accessed in their original forms for 
certain purposes. These materials must be stored and made 
accessible somewhere, so it is suggested that the library and 
the archive will continue to be relevant for the foreseeable 
future. But the quantity of digital materials - whether digital 
representations of physical materials or “born digital” 
continues to increase, and to far outstrip the increase in the 
quantity of print materials. Storage, organization, and 
preservation of physical materials will continue to be 
important, but as the quantity of digital materials in the world 
continues to increase, storage, organization, and preservation 
of digital materials will be increasingly important. 


92 Manual of Digital Libraries 


As a result of the vast increase in the quantity of digital 
material, information spaces will increase in importance as 
spaces where users can make use of the digital materials 
available to them. The information commons will come to be 
increasingly important in libraries, as both a physical and an 
intellectual space. Both of these types of information 
commons suppor the other— as physical commons become 
more widespread there will be increasing opportunity and 
need for users to access material in intellectual commons, 
and as intellectual commons come to contain more material 
and possess more functionality there will be more incentive 
for users to make use of them both individually and collectively 
in physical commons. 


As libraries, both digital and physical libraries fulfill the 
same functions— both are cognitive spaces that can be 
intellectually moved through and modified to suit cognitive 
needs. As spaces, digital and physical libraries differ in their 
capacity to fulfill the same functions. Digital libraries are unable 
to fulfill some of the functions of the physical library as physical 
spaces, but are able to offer functions beyond what the 
physical library can offer as cognitive spaces. This is likely to 
be where much of the development of digital libraries will occur 
in the years ahead— enhancing the cognitive space by 
augmenting representations of the ideas in the materials with 
new kinds of extensions, hyperlinks, and annotations, while 
also adding capabilities for users to create profiles that support 
more personalized interactions among people and digital 
libraries. 


Digital libraries are not physical spaces, and so are 
unable to fulfill those functions for which the physicality of the 
library is important, functions of the library that are by nature 
place-bound. One such function of the library is to be a place 
for people to congregate; in short, the digital library cannot 
fulfill the function of a library as a physical community center. 


Digital Libraries — The Concept Elaborated 93 


Another such function is to be a space that can be physically 
moved through and modified to suit physical needs. Thus, 
the visceral advantages of holding, seeing, and smelling 
material objects and the sense of awe that well-designed 
physical spaces offer are missing in digital libraries. Just as 
face-to-face communication is often preferable to mediated 
communication, working with digital materials leaves 
something behind in exchange for convenience and new 
functionality. Working with digital materials and in virtual 
spaces, however, enables shariurn-style functionality that 
may be used to expand the community of users far beyond 
what is possible in a physical space. 


Physical space is an important constraint in physical 
libraries, but minimally important in digital libraries. The 
concept of place, however, is equally important in both 
physical and digital libraries because a sense of place is 
dependent on functionality, community, and personal 
experiences in the place. Digital libraries support new kinds 
of functionality, much broader communities, and emerging 
senses of place. Digital libraries project a ‘look and feel’ 
through interfaces that ‘brand’ the library and give patrons a 
vicarious sense of place. Although this vicarious sense of 
place cannot match the real place, it is argued that digital 
libraries offer important new extensions to physical libraries 
and that these extensions are mainly toward the conceptual 
side of the space continuum, where more malleable 
representations and extensions art possible. This is viewed 
as an important move forward in making ideas more 
accessible to people and the new capabilities in these digital 
places will cause people to seek them out and appreciate 
them just as they do physical libraries. 


Kohl argues that “we have outgrown the metaphor” of 
the library as place. At best, Kohl suggests, “a place can 
continue to be part of the definition of a library, but it is reduced 
to only a part”. Itis agreed that the metaphor of place must in 


94 Manual of Digital Libraries 


future be only part of the conception of the library. However, 
it will remain an important part, as the notion of what a place 
is continues to change. It is believed that libraries are 
fundamentally spatial, but that the definition of space must 
be broadened- the most critical element of this space may 
not be that it is either physical or virtual, but that it is intellectual. 


2.4. ADVANTAGES AND DISADVANTAGES OF DIGITAL 
LIBRARIES 


The fundamental reason for building digital libraries is 
a belief that they will provide better delivery of information 
than was possible in the past. Traditional libraries are a 
fundamental part of society, but they are not perfect. Can we 
do better? Enthusiasts for digital libraries point out that 
computers and networks have already changed the ways in 
which people communicate with each other. In some 
disciplines, they argue, a professional or a scholar is better 
served by sitting at a personal computer connected to a 
communications network than by making a visit to a library. 
Information once available only to the professional is now 
directly available to all. From a personal computer, the user 
is able to consult materials that are stored on computers 
around the world. Conversely, all but the most diehard 
enthusiasts recognize that printed documents are so mucha 
part of civilization that their dominant role in storing and 
conveying information cannot change except gradually. 
Though some important uses of printing may be replaced by 
electronic information, not everybody considers a large-scale 
movement to electronic information desirable, even if it is 
technicall, economically, and legally feasible. 


While traditional libraries are limited by storage space, 

digital libraries have the potential to store much more 
information simply because digital information requires very 
little physical space to contain it. As such, the cost of 
maintaining a digital library is much lower than that of a 


Digital Libraries — The Concept Elaborated 95 


traditional library. A traditional library must spend large sums 
of money paying for staff, book maintenance, rent, and 
additional books. Digital libraries do away with these fees. 


Digital libraries can immediately adopt innovations in 


technology providing users with improvements in electronic 
and audio book technology as well as presenting new forms 
of communication such as wikis and blogs. Here are some of 
the potential advantages of digital libraries. 


o 


No Physical Boundary : There are no physical 
existence of digital libraries like traditional libraries. 
Hence, the user of a digital library need not go to the 
library physically. 


Round-the-clock Availability : A major advantage of 
digital libraries is that people from all over the world 
can gain access to the information at any time, as long 
as an Internet connection is available. 


Structured Approach : A digital library provides access 
to much richer content in a more structured manner, 
that is, we can easily move from the catalog to the 
particular book, then to a particular chapter, and so on. 


Digital Library brings the Library to the User : To use 
a traditional library, a reader must go there. At a 
university this may take only a few minutes, but most 
people are not at universities and do not have a library 
nearby. Many engineers and many physicians have 
depressingly poor access to the latest information. 


A digital library brings the information to the user's desk, 
either at work or at home. With a digital library on the 
desk top, the reader need never visit a library building. 
There is a library wherever there is a personal computer 
with a network connection. 


Computer Power is used for Searching and Browsing: 


96 


Manual of Digital Libraries 


Paper documents are convenient to read, but finding 
information that is stored on paper can be difficult. 
Despite the myriad of secondary tools and the skill of 
reference librarians, using a large library can be a tough 
challenge. A claim that used to be made for traditional 
libraries is that they stimulate serendipity, because 
readers stumble across unexpected items of value. The 
truth is that libraries are full of useful materials that 
readers discover only by accident. 


In most aspects, computer systems are already better 
than manual methods for finding information. They are 
not as good as everybody would like, but they are good 
and improving steadily. Computers are particularly 
useful for reference work that involves repeated leaps 
from one source of information to another. 


Information can be Shared : Libraries and archives 
contain much information that is unique. Placing digital 
information on a network makes it available to 
everybody. Many digital libraries or electronic 
publications are maintained at a single central site, 
perhaps with a few duplicate copies strategically placed 
around the world. This is a vast improvement over 
expensive physical duplication of little used material, 
or the inconvenience of unique material that is 
unobtainable without traveling to the location where it 
is stored. 


Information is Easier to keep Current : Much important 
information needs to be updated continually. Printed 
materials are awkward to update, since the entire 
document must be reprinted and all copies of the old 
version must be tracked down and replaced. Keeping 
information current is less laborious when the definitive 
version is in digital format and stored on a central 
computer. 


Digital Libraries — The Concept Elaborated 97 


Many libraries maintain online versions of directories, 
encyclopedias, and other reference works. Whenever 
revisions are received from the publisher, they are 
installed on the library’s computer. 


In addition, there is flexibility in the use of search terms, 
that is, key words. A digital library can provide very user- 
friendly interfaces, giving clickable access to its 
resources. 


° Information is always Available : The doors of the digital 
library never close; a recent study at a British university 
found that about half the use of a library’s digital 
collections was at hours when the library buildings were 
closed. Materials are never checked out to other 
readers, mis-shelved, or stolen; they are never in an 
off-campus warehouse. The scope of the collections 
expands beyond the walls of the library. Private papers 
in an office or in a library on the other side of the world 
are as easy to use as materials in the local library. 


This does not imply that digital libraries are perfect. 
Computer systems can fail, and networks may be slow 
or unreliable. But, compared with a traditional library, 
information is much more likely to be available when 
and where the user wants it. 


e | New Forms of Information become Possible : Print is 
not always the best way to record and disseminate 
information. A database may be the best way to store 
census data, so that it can be analyzed by computer. 
Satellite data can be rendered in many different ways. 
A mathematics library can store mathematical 
expressions as computer symbols that can be 
manipulated by means of a program such as 
Mathematica or Maple. 


Even when the formats are similar, materials created 


98 Manual of Digital Libraries 


explicitly for the digital world are not the same as 
materials originally designed for paper or other media. 
Words that are spoken have a different impact from 
words that are written, and online textual materials are 
subtly different from either the spoken or the printed 
word. Good authors use words differently when they 
write for different media and users find new ways to 
use the information. Materials created for the digital 
world can have a vitality that is lacking in material that 
has been mechanically converted to digital formats, just 
as a feature film never looks quite right when shown on 
television. 


Each of the benefits described above can be seen in 
existing digital libraries. Another group of potential benefits, 
which have not yet been demonstrated, hold out tantalizing 
prospects. The hope is that digital libraries will develop from 
static repositories of immutable objects to provide a wide 
range of services that will allow collaboration and exchange 
of ideas. The technology of digital libraries is closely related 
to the technology used in electronic mail and teleconferencing, 
which have historically had little relationship to libraries. The 
potential for convergence between these fields is exciting. 
Lastly, when the library has no space for extension, digitization 
is the only solution. 


A potential benefit is that digital libraries may save 
money. There has been a notable lack of hard data on the 
cost of digital libraries, but some of the underlying facts are 
clear. Conventional libraries are expensive. They occupy 
expensive buildings on prime sites. Big libraries employ 
hundreds of people—well educated, though poorly paid. 
Libraries never have enough money to acquire and process 
all the materials they desire. Publishing is also expensive. 
Converting to electronic publishing adds new expenses. To 
recover the costs of developing new products, publishers 
sometimes charge more for a digital version than for the 


Digital Libraries — The Concept Elaborated 99 
printed equivalent. 


Today’s digital libraries are also expensive—initially, 
more expensive than conventional ones. In theory, the cost 
of maintaining a digital library is lower than that of a traditional 
library. A traditional library must spend large sums of money 
paying for staff, book maintenance, rent, and additional books. 
Although digital libraries do away with these fees, it has since 
been found that digital libraries can be no less expensive in 
their own way to operate. 


Moreover, the components of digital libraries are 
declining rapidly in price. As the cost of the underlying 
technology continues to fall, digital libraries become steadily 
less expensive. In particular, the costs of distributing and of 
storing digital information decline. The reduction in cost will 
not be uniform. Some things are already cheaper by computer 
than by traditional methods. Other costs will not decline at 
the same rate and may even increase. Overall, however, there 
is a great opportunity to lower the costs of publishing and 
libraries. 


Lower long-term costs are not necessarily good news 
for existing libraries and publishers. In the short term, the 
pressure to support traditional media alongside new digital 
collections is a heavy burden on budgets. Because people 
and organizations appreciate the benefits of online access 
and online publishing, they are prepared to spend an 
increasing amount of their money on computing, networks, 
and digital information. Most of this money, however, is going 
not to traditional libraries but to new areas— computers, 
networks, web sites, and webmasters. 


Publishers face difficulties because the normal pricing 
model of selling individual items does not fit the cost structure 
of electronic publishing. Much of the cost of conventional 
publishing is in the production and distribution of individual 
copies of books, photographs, video tapes, or other artifacts. 


100 Manual of Digital Libraries 


Digital information is different. The fixed cost of creating the 
information and mounting it on a computer may be substantial, 
but the cost of using it is almost zero. Because the marginal 
cost is negligible, much of the information on the networks 
has been made openly available, with no restrictions on . 
access. Not everything on the world’s networks is freely 
available; however, a great deal is open to everybody, and 
this undermines revenue for the publishers. These pressures 
are inevitably changing the economic decisions of authors, 
users, publishers, and libraries. 


Additionally, an exact copy of the original can be made 
any number of times without any degradation in quality. And, 
a particular digital library can provide the link to an other 
resources of other digital libraries very easily; thus a seamlesly 
integrated resource sharing can be achieved. Besides, the 
digital library at every level should form an integral part of 
global networks. The value of library and information networks 
cannot be underestimated. E-mail, ftp, telnet, http, gophers, 
WWS, online catalogues, directories, guides, databases, e- 
journals, e-newsletters, discussion lists, bulletin boards, 
Usenet news-groups, Archie, Veronica, Wais, Mosaic, 
Netscape, etc., are a few valuable services available via 
Internet, which enable us to provide massive access to global 
information. 


Focussing on disadvantages, it can be said, some 
people have criticized that digital libraries are hampered by 
copyright law because works cannot be shared over different 
periods of time in the manner of a traditional library. The 
content is, in many cases, public domain or self-generated 
only. Some digital libraries, such as Project Gutenberg, work 
to digitize out-of-copyright works and make them freely 
available to the public. 


Digital libraries cannot reproduce the environment of a 
traditional library. Many people also find reading printed 


ES a 


Digital Libraries — The Concept Elaborated 101 


material to be easier than reading material on a computer 
screen, although this depends heavily on presentation as well 
as personal preferences. Also, due to technological 
developments, a digital library can see some of its content 
become out-of-date and its data may become inaccessible. 


2.5. CONVENTIONAL VERSUS DIGITAL LIBRARIES 


Digital libraries are sets of electronic resources and 
associated technical capabilities for creating, searching, and 
using information. In this sense they are an extension and 
enhancement of information storage and retrieval systems 
that manipulate digital data in any medium — text, images, 
sounds; static or dynamic images, and exist in distributed 
networks. The content of digital libraries includes data, 
metadata that describe various aspects of the data, e.g., 
representation, creator, owner, reproduction rights, and 
metadata that consist of links or relationships to other data 
or metadata, whether internal or external to the digital library. 


Digital libraries are constructed—collected and 
organized—by and for a community of users, and their 
functional capabilities support the information needs and uses 
of that community. They are a component of communities in 
which individuals and groups interact with each other, using 
data, information, and knowledge resources and systems. In 
this sense they are an extension, enhancement, and 
integration of a variety of information institutions as physical 
places where resources are selected, collected, organized, 
preserved, and accessed in support of a user community. 
These information institutions include, among others, libraries, 
museums, archives, and schools, but digital libraries also 
extend and serve other community settings, including 
classrooms, offices, laboratories, homes, and public spaces. 
Implicit in this detail of digital libraries is a broad 
conceptualization of library “collections.” 


102 Manual of Digital Libraries 


One theme is that digital libraries encompass the full 
information life cycle— capturing information at the time of 
creation, making it accessible, maintaining and preserving it 
in forms useful to the user community, and sometimes 
disposing of information. With physical collections, users 
discover and retrieve content of interest; their use of that 
material is independent of library systems and services. With 
digital collections, users may retrieve, manipulate, and 
contribute content. Thus users are dependent upon the 
functions and services provided by digital libraries though work 
practices may become more tightly coupled to system 
capabilities. 


A second theme implicit in the detail of digital libraries 
is the expanding scope of content that is available. Content 
now readily available in digital form includes primary sources 
such as remote sensing data, census data, and archival 
documents. Use of scientific data sets is computationally 
intensive, raising questions about the role the library should 
play in providing access to the resources and to the tools to 
use them. Nor are scientific data the only challenge. As more 
archives and special collections are digitized, many primary 
sources in the humanities are becoming more widely available 
Online than are secondary sources such as books and 
journals. Distinctions between primary and secondary sources 
are problematic, however, as they vary considerably by 
discipline and by context. Some sources may be primary for 
some purposes and secondary for others. Here we can 
oversimplify the terms by referring to raw data and to unique 
or Original documents as primary sources and to analyzed or 
compiled data and to reports of research as secondary 
sources. 

A third theme is the need to maintain coherence of 


library collections. Descriptions of journal articles, for example, 
can be found in catalogues, indexing and abstracting 


Digital Libraries — The Concept Elaborated 103 


databases, and digital libraries. Users want to identify articles 
of interest and to move seamlessly from bibliographic 
references to the full text, and from references in those texts 
directly to the full content of the cited articles. Sometimes 
they also wish to link directly to primary sources on which the 
articles are based. Supporting these uses of journal-related 
information requires various forms of links within and between 
many independent catalogues, databases, and digital 
libraries. 


Now the question arises — Are Digital Libraries different 
from Conventional Libraries? If so, then why the word 
“Libraries” was tacked onto “ Digital” at all. On the other hand, 
if they are the same then why we need the term “Digital” at 
all. In fact there are people like R M Braud who feel, that 
using the term “digital” is a redundancy. Braud says “7he 
product that we manage in libraries, information, and the 
familiar container for that product, the codex book. These 
containers have influenced library architecture, but they do 
not themselves define what a library is. We do not bother to 
qualify our libraries by calling them Clay Libraries or Papyrus 
Role Libraries, why do we have to call the digital libraries’. 
However there is a distinction to be made between 
conventional libraries and digital libraries. Physical containers 
for information, for example books are capable of direct 
access and can be managed physically. On the other hand, 
digital data are made of electronic signals that rely on an 
interpreting machine before there can be any human 
interaction with it. 


Let us re-look at the basic functions of conventional 
libraries. 
4. Collection : It includes techniques for evaluation of 


information resources directed towards target users. Cost- 
effective storage and preservation of such resources. 


104 Manual of Digital Libraries 


2. Organization and Representation : Classification 
and cataloguing of information resources which is relevant to 
the potential users. 


3. Access and Retrieval : Access considerations 
include design of physical space and organisation of materials 
within such space to respond effectively to user needs and 
expectations. Information retrieval has been addressed in the 
design of systems specific to that task. 


4. Analysis, Synthesis and Dissemination Functions: 
Circulation and value added service like reference services, 
producing evaluation reviews and devising commonality out- 
reach programs. 


Librarians and Information Scientists have developed 
techniques, procedures and systems for addressing each of 
these functions for many kinds of data and presentation. 
Digital Libraries are unlikely to omit or add any of the roles 
played by conventional libraries. Although the 
implementations would depend on the local context and 
technologies chosen, there must be with in commonly 
accepted constrains which must satisfy the users. 


Digital Libraries research has been focused on 
automating the activities carried out by librarians, such as 
automatic indexing and classification and expert systems for 
reference desks. Digital catalogues can support long 
keywords along with deferential weights, long user queries, 
ranked retrieval etc. Information search via hypertext 
illustrates that indices can be implicit rather than explicit, giving 
users a seamless blend of primary and secondary works. 
Further, some current library activities may become irrelevant, 
for example, circulation problems originating in a fixed number 
of copies of each work simply disappears. We might redefine 
and redesign library services to achieve the basic aims more 
effectively than is possible now. Thus digital libraries not only 


Digital Libraries — The Concept Elaborated 105 


involves automation of each traditional library activity and 
service, but also calls for redefinition of services, new 
groupings of services or replacements of groups of services 
with other solutions. 


Makri et. al. have conducted a survey in 2007, on user’s 
mental models of traditional and digital libraries, where they 
focussed on eight themes about traditional and digital libraries. 
It should be noted that the divide between traditional and 
digital libraries is somewhat blurred by the use of electronic 
catalogues to support searching in traditional libraries and of 
occasional Internet searching to support finding documents 
in both kinds of libraries. Their’s findings are : 


Theme 1: Similarities and Differences Between Traditional 
and Digital Libraries 


Participants regarded both traditional and digital 
libraries as having an element of hierarchical organization: 


A digital library is organized in a similar way in that it 
splits things up into articles and books and things like that 
and we suppose a physical library splits things up into sections 
such as journals and books, so they are the kinds organized 
in a similar way. 


Furthermore, the broad information seeking goals of 
users can be satisfied in both traditional and digital libraries. 
However, participants highlighted differences in the process 
of working with traditional and digital libraries: 


You use similar search terms, using “design,” “layout.” 
It means, you are both typing search terms into a box and 
clicking “go.” so they both start off similar. But using the 
traditional library then moves over and you are browsing 
through books and looking at indexes, contents pages, looking 
through chapters, whereas this one just sort of stays online, 
and you are just looking through lists of abstracts and things. 


106 Manual of Digital Libraries 


Superficial differences were identified between 
traditional and digital libraries: 


There is a lot more of them [available documents] 
because they are all electronic. 


These differences influence how each type of library is 
used with regard to the ownership of documents and how 
users go about using them to fulfil their wider information 


seeking goals: 


Going and looking for the book itself is a different 
experience as you are physically going to a floor and taking 
books off the shelf and we have to take it out for a set period 
of time and return it, whereas with a digital library we can just 
save an article to my hard disk. 


Something we do in a physical library is that we pick up 
a book when we think it might be useful and scan read it. You 
have not got the option on a digital library and that makes me 
a lot more choosy about what we think might be relevant in a 
digital library. 


Although the overall goal of information seeking was 
deemed to be the same, participants were aware that different 
information seeking goals could be fulfilled by each type of 
library : 


One of the important differences would be the subject 
matter of what we are looking for. In a traditional library, we 
are looking for books so probably looking for searches under 
the title and the author because there is not a lot else that 
they would have entered into the library system. Whereas 
when we are looking on Google, it can search through the 
text, and ACM, the abstract as well as the title, the full text of 
the papers. 


Participants also highlighted that traditional and digital 
libraries have contrasting benefits. Digital libraries can bring 


Digital Libraries — The Concept Elaborated 107 


back seemingly irrelevant results, yet it is quicker and easier 
for users to assess the relevance of electronic documents 
than traditional library catalogue entries, because only limited 
metadata about each document are displayed to users. 
However, this is counter-balanced by the perceived quality 
of physical resources, which may be available exclusively 
offline. 


Participants also noted that there are often fewer 
resources on the traditional library catalogue, which can lead 
to greater search accuracy when compared with searching 
in a digital library. Two LIS students, with their greater insights 
into cataloguing and classification, attributed this to human 
involvement in the cataloguing of electronic library catalogues. 
For example: 


The standard of cataloguing and classification of books 
is much higher than in digital libraries. 


Conversely, one LIS student identified the need for more 
careful selection of search terms using the traditional library 
catalogue which was attributed to out-of-date software and 
the fact that the catalogue does not support full text searching: 


The library catalogue does not cope with three-word 
terms very well. You have to be more specific in the catalogue. 
In the digital library, you can probably use many more search 
terms. The electronic catalogue or the software is not as good 
and is probably not as up-to-date. 


These findings indicate that users have a good idea of 
the layout and procedures in the traditional library terms this 
“how inputs become outputs.” However, with digital libraries 
users tended to focus more on describing the common inputs 
(search terms) and outputs (search results) with varied levels 
of understanding of how search terms are turned into search 
results. This is further discussed below. 


108 Manual of Digital Libraries 


Users seemed to be aware of how their information 
seeking goals could be accomplished in the context of both 
types of system. They took a more search-centred approach 
to information seeking in digital libraries than in traditional 
libraries, where both searching the electronic catalogue and 
physically browsing the shelves were common. Users also 
demonstrated that they were aware of how their goals could 
be accomplished in both types of library. 


There was widespread disagreement about which 
resources returned the most relevant results and why. As 
noted above, some held the view that the electronic catalogue 
of the traditional library returned more relevant results than 
the digital library due to human involvement in cataloguing. 
Some assumed that the digital library would return too many 
results that the system would judge as relevant but the user 
would not. Another assumed that the electronic catalogue 
brought back fewer results because less thought had been 
put into designing the search component. This reflects the 
impoverished nature of subject mental models of the 
searching and relevance ranking systems. 


It is interesting to note that no comparisons between 
digital and traditional libraries were spontaneously given. 
When asked about the similarities and differences between 
digital libraries and “anything else used in the past” users 
made explicit reference to search engines and library 
catalogues. One participant argued that he would class digital 
libraries and search engines “in the same category because 
all you are doing is typing in words, trying to narrow down a 
topic which you are interested in, whether it be a paper, book 
or Web site, and finding it.” 


Another participant admitted to approaching a digital 
library search in the same way that he would approach a 
Google search: 


The way we used the quotation marks to separate 


Digital Libraries — The Concept Elaborated 109 


“focus groups” and “evaluation” [in the ACM Digital Library] . 
. . Was because of experience in using search engines like 
Google, where we usually use quotation marks to do that. 


When asked about the similarities and differences 
between traditional libraries and “anything else used in the 
past,” users were less willing to make analogies. Although 
some participants related the academic library to “other 
libraries,” no other concrete analogies were made. 


However, there probably is something ... like ina record 
shop or book shop or something ... that whole physical part 
of just looking for a book ... but it is a whole different thing in 
a shop. It is a very different thing. 


Theme 2: Access Issues 


One aspect of library use that affected participants in 
both types of library was that of access rights. The notion of a 
library card as a key to accessing documents in the traditional 
library was held by all participants although, because the task 
did not explicitly ask participants to take documents out of 
the library building, there were not as many comments 
surrounding traditional library access as might be expected. 


As well as providing physical access to the library 
building, the library card was identified as a physical entity 
which holds information about the patron based on the 
barcode printed on the card. It was also regarded as an entity 
which would restrict the number of documents that could be 
loaned from the library at any one time: 


Whether we take an item out depends how much space 
we have on my library card. If we had space we might just 
get a few out [. ..] But if we only have space for three books, 
then we had just sit and look at them and make sure that they 
had stuff that we were sure was relevant. 


Also in the traditional library, interdisciplinary students 


110 Manual of Digital Libraries 


highlighted the physical access issues surrounding 
documents related to their course being “spaced around a 
little bit”, either in different sections of a particular library or in 
different university libraries altogether. 


Although participants typically had a better 
understanding of access in traditional than digital libraries, 
physical access assumptions were sometimes found to be 
erroneous. For example, one participant was unaware that 
documents could be requested from other university sites: 


“It puts me off when a book is elsewhere because it is 
not like | want to get the book straight away or make so much 
effort, | would have preferred it if there was a way to order 
them so that | could view them at my own leisure.” 


In addition, physical access issues can combine with 
other document access restrictions in a traditional library, such 
as the length of loan associated with a particular copy of a 
document: 


Oh, it says in Psychology there is also one, but this one 
is not a one week loan like the one in Psychology, so you 
could take it out and would not have to renew it every week. 


Access issues in digital libraries were found to pose far 
more of a problem than with traditional libraries. Digital library 
access restrictions also have the potential of creating more 
inconsistencies and errors in users’ mental models. Digital 
library users are often unsure as to why certain sections of 
digital libraries are restricted, whether registration/ 
subscription is required to view certain content and whether 
payment is required in order to view restricted content: 


It says “request document.” we are not sure if that 
means we would have to pay for it. 


This lack of clarity in users’ mental models relating to 
access restrictions discouraged users from using certain 
sections of the digital library. More than once, users made 


Digital Libraries — The Concept Elaborated 111 


the choice not to invest time in verifying their assumptions or 
answering their questions surrounding the need to register 
and pay for access to the library. For example: 


Oh no! You have got to log in! That is probably why | 
have not used [the ACM binder feature] before, because we 
could not be bothered to set up a personal ACM account. 
And we think you now have to register or subscribe or 
something and we never know whether you have to pay or 
whether you donot have to pay. 


Access issues surrounding digital libraries can influence 
the behaviour of users, depending on how the user perceives 
the access restrictions to work. For example, users reported 
being discouraged from using libraries for which they did not 
have a clear idea of how access restrictions applied. 
Conversely, users reported often seeking out only information 
from sources that they knew they had unrestricted access to. 


Sometimes, users sidestepped the issue of electronic 
access by reverting to traditional forms of information seeking 
in order to retrieve the full text of documents that might be 
difficult to obtain due to access restrictions in a digital library: 


The access can be really slow and confusing when 
using different journal providers, so we had rather use 
indexing services like LISA and the physical journals 
themselves. 


Some participants highlighted the confusion arising from 
having to access different information through different 
providers. This was highlighted by one electronic provider 
who redirected the user to other digital libraries to continue 
their information seeking task: 


Ok, that leads me to a different digital library [. .. ] 
Emerald [. ..] or maybe it is just a publisher of the book. 


In addition, it is possible for the user to be re-directed 
to a site that is assumed to be another digital library but is 


|| r 


112 Manual of Digital Libraries 
not, and hence hampers the search for information: 


It leads them to a different database [...] ... [...]. At this 
point we would not look at this further. It doesnot seem to 


give me what we want. 


Although one participant did note a positive benefit of 
access restrictions, in terms of helping to assure document 
quality, the study has shown that access issues are a source 
of confusion and inconsistency in users’ mental models, and 
that such issues can have a negative impact on user 
behaviour. Users were often unwilling to invest time in 
verifying assumptions, instead seeking only information from 
sources which provide unrestricted access, or reverting to 
traditional forms of information seeking. Participants 
demonstrated risk-averse, satisfying behaviours that avoided 
potentially time-consuming exploration. This unadventurous 
behaviour prevented participants from developing more 
sophisticated mental models of access systems. 


Theme 3: Assessment of Library Content 


It is seen here, users were sometimes unaware that 
they could access documents in other physical libraries and 
focused entirely on the University Science Library. For digital 
libraries, there was more explicit recognition that the user 
had choices. However, users reported difficulties in knowing 
which digital libraries contain information on certain subjects, 
particularly if they had accessed the Athens portal 
(www.athensams.net), which provides managed access to a 
broad range of subscription-based digital resources: 


Inthe Athens there is lots of individual electronic libraries 
and some of them have got certain stuff in and some of them 
have got other stuff in, we find it fairly hit and miss. 


Even once the user has identified which digital library 
may be relevant to their search, problems can arise with 
accessing it and ascertaining which journals are available in 


Digital Libraries — The Concept Elaborated 113 


the current library and which are not. This causes further 
confusion. In addition, older journals are often not carried by 
digital libraries, which forces users to revert to printed 
collections or avoid using full text digital libraries altogether: 


Digital libraries mainly only show the last two years 
anyway, and often you will want to go back further than that 
and the access can be really slow and confusing when using 
different journal providers, so we had rather use indexing 
services like LISA and the traditional journals themselves. 


Users’ poor understanding of which digital libraries 
contain information about certain subjects might be explained 
by confusion arising from non-firm boundaries between users’ 
mental models of individual digital libraries. Users experienced 
difficulties in predicting which goals could be accomplished 
at individual libraries and how these goals could be achieved. 
This extends beyond knowing what sort of information is 
available in each digital library to what access rights users 
have for which journals. Makri et al. findings highlight the need 
to assist users in forming “bridge” between their mental 
models of separate digital libraries to develop a more holistic 
understanding of what is available here and now and how. 


Theme 4: Document and Results Organization 


Within a particular library, the next issue is how 
materials are organized. On this topic, participants were more 
articulate about traditional than digital libraries. They 
described documents in a traditional library as being arranged 
hierarchically. In addition, two students noted that, although 
different libraries may have different classifications systems 
for organizing documents, the way of finding documents in 
traditional libraries is broadly similar: 


We have been through five or six years of universities 
and have used different libraries, but we usually approach it 
in the same broad way, we find the classmark and browse 


114 Manual of Digital Libraries 


the area. 


Although the overall approach is common, participants 
reported sometimes seeking guidance from a librarian to help 
them understand the particular classification system within a 
traditional library. For finding particular documents, 
participants often learned where relevant sections of the 
library were located through a library induction or by using 
the signage on every floor and section of the library as 
guidance. Then, within a particular section, participants noted 
that documents are arranged in numerical order and 
alphabetically according to the classmark of the book. 


The location of the documents also provided users with 
some preliminary information about their relevance: 


Oh that’s interesting, it is in “ENGINEERING,” so maybe 
although the title of the book looks highly related to what we 
want, the title might be completely misleading. 


Within traditional libraries, due to their structure, much 
information access is achieved through browsing, which can 
lead to serendipitous discoveries. For example: 


Some one was actually browsing for another book in 
the HCI section of this Science Library and another person 
was reading along the titles and came across it by accident 
in a book by Heath and Luff. 


A common perception amongst users was that the act 
of browsing was possible only in traditional and not digital 
libraries: 


The way we use a traditional library is we tend to find a 
general physical area and then browse, and you cannot do 
that with a digital library. 


The participants in this study did not draw parallels 
between locating a particular area of the physical library then 
browsing in the vicinity and issuing a search request in a digital 


Digital Libraries — The Concept Elaborated 115 


library then browsing through the results list. However, they 
did describe the organization of digital resources in terms of 
searching and results organization. 


Participants had varying levels of understanding of how- 
searching works in a digital library and traditional library 
catalogue, from the rather limited ... 


We typed some search terms in and it brought us some 
search results back. We donot think about it any further than 
that. 


... to the comparatively sophisticated: 


We think it is all down to the way that it does the 
searching, it is all down to probabilities. The top things on the 
list mean that, for example, they are 90 percent likely that 
they have got it right and that percentage would reduce as 
you go down the list. 


Assumptions about exactly where the digital library was 
searching for their search terms also varied. Those 
participants who used the ACM digital library had the most 
difficulty in ascertaining how the retrieval process worked 
because of the apparent lack of correspondence between 
the search terms that were input and the results returned: 


We are not sure how exactly, whether it checks the 
keywords of papers or the titles or exactly how it brings back 
stuff that it thinks is relevant. Usually it is the case of putting 
in a few things and seeing what it comes back with. 


The symptoms of the lack of transparency between 
search terms and results were observable when users 
explained how the relevance bar in the ACM digital library 
works. Users distrusted the relevance bar, often ignoring it 
and using their own heuristics about how far they should trawl 
through results before they ceased the search. Surprisingly, 
two separate students made the unlikely analogy of the 
relevance bar in the library representing a “pint of Guinness”: 


116 Manual of Digital Libraries 


We think from using this system before it goes by 
“relevance” and it is got this thing on the right-hand side. It 
looks like a pint of Guinness [laughs]. The more full it is then 
the more relevant it should be to your search. we largely ignore 
that. 


Several participants perceived the results returned from 
a search in the ACM digital library as including a lot of low 
relevance results. One participant attributed this to the 
quantity of documents held in the library database and not to 
the fact that the search engine might be working in a different 
way to other digital libraries. As in their assessments of 
electronic catalogues, some students assumed that digital 
libraries that bring back more relevant results have their 
documents classified, at least in part, by humans. 


Non-ACM users also displayed different levels of 
understanding about how search results were returned. In 
these cases, confusion about the relevance of search terms 
led to useful assumptions or discoveries about how searching 
actually worked. This illustrates the important role of 
interaction and feedback in helping users develop appropriate 
mental models, a topic to which we return below: 


To be honest with you, we cannot see why it brought 
up “indexing and museum.” [Clicks on hyperlink]. Oh, here it 
is in the subject field [. . .) Because we asked it to do an “all- 
fields” search, so it is not just searching the title fields, it is 
searching the subject fields as well. 


Similar confusion surrounded how searching works on 
an electronic library catalogue within the traditional library: 


It is looking for results containing “Christopher” and 
“Alexander” but we donot think it is recognized the author, 
we mean it is come out with articles with Christopher and 
Alexander in them, but we would have expected articles by 
Christopher Alexander at the top of the screen. 


Digital Libraries — The Concept Elaborated 117 


An awareness that they had a poor understanding of 
how search works led many users to try searches of which 
they had low expectations: 


So we are typing in “pattern languages.” but we know 
this is a real long shot. 


We are not sure that word’s very useful, but we are 
going to do it anyway. 


Overall, traditional library users have built a strong 
knowledge of the layout of the library and how documents 
are organized. This is supported by physical cues such as 
signage and inductions run by librarians. This knowledge 
sometimes allows users to form assumptions about the 
potential relevance or utility of a document in the traditional 
library based on location. The lack of such depth of knowledge 
with regard to digital libraries suggests the need for more 
effective digital cues to help users understand how information 
is organized and presented. 


The perception that browsing is not possible in a digital 
library may be explained as users having “incomplete models” 
due, at least in part, to them focusing on search-based 
information seeking goals. This perception is exacerbated 
by the current interface designs of digital libraries, which tend 
to focus on search features. 


With regard to how searching works and results are 
organized, users’ varying levels of understanding might be 
explained by errors in the users’ perceived internal structure 
of the system, which in turn might be caused by a lack of 
relevant feedback from the system. It may also be due to 
users forming their own boundaries about what their mental 
model of a library should include. 


An interesting symptom of this varying level of 
understanding is the distrust that some students felt for the 
ACM relevance bar, resulting in the formation of heuristics 


118 Manual of Digital Libraries 


based on personal opinions of relevance that may or may 
not be appropriate. Another symptom is the fact that users 
are often prepared to conduct searches that they anticipate 
are unlikely to yield valuable results, exhibiting “superstitious 
behaviour”. Because all users employed relatively 
sophisticated searching strategies and indicated a sound level 
of competence with searching digital domains, it is likely that 
lack of appropriate feedback when searching is the underlying 
cause of “how it works” confusion with regard to searching 
and obtaining relevant and consistent results. 


Theme 5: Understanding of Search 


It is seen in theme-1, none of the participants made 
specific comparisons between components of the traditional 
and digital libraries except when asked to. However, 
participants spontaneously made comparisons between 
digital libraries and other digital entities, such as Internet 
search engines, e-commerce sites, and electronic library 
catalogues. 


Some participants assumed that the search engine 
components of Internet search engines, e-commerce sites, 
and digital libraries work in a similar manner, even if surface 
differences exist at the interface level or in what format search 
terms should be entered: 


We think the fundamental technology is similar but there 
are specific differences at the interface level in terms of 
options you can select [. . .] and the different ways in which 
they classify the things in the database. 


This led this participant to adopt similar searching 
strategies in both an Internet search engine and a digital 
library. Some participants used the same search terms to 
search digital libraries and the traditional library catalogue. 
Others used the search components in different ways, 
although confusion was rife surrounding just how similar the 


a T 


Digital Libraries — The Concept Elaborated 119 


searching processes of digital libraries and Internet search 
engines are. Some of the confusion surrounding how the 
search components of digital libraries and Internet search 
engines work might arise from the blurred distinction between 
them, which is itself symptomatic of the disagreement in the 
definition of the term “digital library” as identified in the 
literature. For example: 


It comes back to the issue of “what is a digital library?” 
because some people argue that Google is not a digital library, 
it is just a search engine. But if you look at a digital library like 
the ACM, well that is just a search engine! [...] all you are 
doing is typing in words, trying to narrow down a topic which 
you are interested in. 


Any comparison between search components of digital 
libraries and e-commerce sites such as Amazon seemed to 
create less confusion amongst participants, because fewer 
surface similarities exist between the search engine 
component of the Amazon site and the search engine 
components of digital libraries. Because there are more 
surface similarities between digital library and electronic 
catalogue search engine components, the confusion 
surrounding digital libraries and Internet search engines 
extends to the traditional library catalogue: 


We do not know if you can do this [wildcard searching] 
on this [electronic catalogue] actually. Usually if you put a 
star [refers to ‘child’] it just covers everything like child’s, 
children on the end there. We may put that star on it just to 
see, [Types in ‘child library’ and presses search button]. No: 
“no exact match.” We know you can do that on the Internet 
search engines, but whether you can do that on here, we are 
not sure. 


One participant, when using the digital library, found a 
review of a book that had been found earlier in the traditional 
library search. The participant attributed finding the book in 


120 Manual of Digital Libraries 


the traditional library catalogue and the review of the same 
book in the digital library to using similar search terms in both 
types of library: 


Yeah, it had a record of the same book that is here in 
the library, which we initially found using the same search 
terms. Well not exactly, because we had to restrict my search 
terms for the library catalogue. 


Another participant displayed an understanding that it 
could be the underlying search technology and not just the 
superficial differences in required search syntax, which has 
an impact on the results obtained from an Internet search 
engine, digital library and traditional library catalogue: 


A lot of better search engines will generate much more 
reliable results, so his expectations are that if we use one 
that uses a particular [searching] technique, we assume that 
others, even in a digital library might work using the same 
technique, and this one does not. 


No users could clearly articulate the differences 
between the different search engines they worked with, 
indicating that users have incomplete models of search engine 
components. It means incomplete models have a negative 
impact on users’ understanding of the functionality of the 
overall system. 


Where users recognized that there might be subtle 
differences in how search engine components work across 
digital media, they did not understand how this should affect 
their behaviour, for example, in how they formulated queries. 
This suggests that there is scope for providing clearer 
information to users about how search engine components 
belonging to different search entities should be used. 


Theme 6: Assessment of Document Relevance 
Once documents have been located, the user needs 


Digital Libraries — The Concept Elaborated 121 


to establish how useful they might be. This applies in both 
traditional and digital settings. In a traditional library, 
participants noted that some of the potential relevance of a 
document can be derived from physical attributes, such as 
the following: 


You can sort of tell [the age and relevance of the book] 
from the last time it was borrowed. 


They have got multiple copies of these, so obviously 
they are good books [...] and they look like general, and they 
are quite thick. 


This book looks very American because of all the glossy 
photos on the front. 


Document relevance in a traditional library was also 
ascertained by flicking or skimming through the entire book 
or reading particular sections of the book such as the contents 
or index pages: 


We had give it a flick. In this case we might look for 
authors or we might look for subject matter. 


As noted above, in the traditional library, potential 
document relevance was also judged by its location or 
classmark. Participants highlighted the difficulty of judging 
the relevance of a document in the traditional library by using 
the traditional library catalogue alone, because very limited 
details are provided to aid users in making the judgement: 


You really do not have enough information on the 
catalogue to determine whether they will be useful or not. so 
you just take a few approximate classmarks and go and 
traditionally look on the shelves. 


This can lead users to dismiss a particular document 
as irrelevant purely due to the lack of metadata provided about 
the document in the traditional library catalogue: 


Ones like this without a year showing we do not tend to 


122 Manual of Digital Libraries 


bother with because they are either not catalogued properly 
or so ephemeral that they do not even have a year. 


Assessing the contents and relevance of documents is 
clearly harder if the document is located at another site, and 
some users were unaware that documents could be 
requested from other sites, making relevance judgements 
costly. Looking up references and citations was highlighted 
as an important way of ascertaining document relevance in 
both traditional and digital libraries: 


If there is a good book on this area, these two papers 
are likely to quote from it. 


In a digital library, where there are fewer types of cues 
to the user, participants placed more emphasis on reading 
the titles and abstracts of documents than on skimming 
through the contents for assessing relevance. 


Overall, users tend to ascertain relevance of traditional 
and digital documents in different ways, which suggests that 
users are aware of the appropriateness of the methods and 
actions that they employ. For example, “flicking” and 
“scanning” paper documents is currently far easier than its 
electronic equivalent and is hence better suited to use in a 
traditional library, while the abundance of abstracts in digital 
libraries make scanning an abstract a feasible alternative to 
attempting to scan the document by scrolling through it. 


Theme 7: Revising the Model 


Participants often tried to clarify assumptions in their 
mental models by observing the feedback that resulted from 
them performing selected actions with the digital library. This 
had a positive result in many cases, helping people identify 
errors in their models that would allow them to use digital 
libraries more effectively. 


Testing assumptions in the mental model can also have 


Digital Libraries — The Concept Elaborated 123 


a negative result. In the case below, the participant assumes 
that the CrossRef search front-end to the ACM digital library 
finds documents written by the same author. Because this is 
one, but not the only, function possible using this front-end, 
the user’s assumptions have led to the construction of a highly 
limited mental model of how it works, and in this particular 
instance, the feedback from the system supports this limited 
model. 


It is tempted here to put in an author's name that we 
knew of to see if it brought up similar articles. Actually, let's 
try it. We will type in “Drew” because one does lots of things 
on usability [conducts search]. This is showing all the articles 
that the author's written we think, because this person is 
always included in the author’s list. So maybe it does do what 
we thought it would ! 


System feedback was also found to influence users’ 
future searching behavior by suggesting potentially useful 
search terms for subsequent searches. These either 
described a slightly different but relevant aspect of the search 
topic or were synonyms of search terms used previously. 


These findings highlight how system feedback can play 
a key role in ensuring that users maximize their understanding 
of the system and thereby form a more complete mental 
model of that system. Effective feedback can help users to 
spontaneously construct new models of unfamiliar systems 
or aspects of systems and to revise their existing mental 
models. 


Theme 8: Troubleshooting Issues 


Participants often had troubleshooting strategies in a 
traditional library to support information finding. This might 
involve (a) checking the surrounding area of the shelves 
where the document was supposed to be, (b) checking the 
returns trolley, and (c) checking whether the document was 


124 Manual of Digital Libraries 


out on loan before either asking a librarian for help or 
requesting that the book be held if and when it is returned. 
When more detailed searching assistance was required, 
participants reported turning to a librarian for support. One 
participant also surmised that librarians could be a useful 
source of information if the electronic catalogue was not 
functioning, because they might keep a paper record of how 
documents are organized in the traditional library. 


Participants regarded troubleshooting information 
seeking problems in a digital library to be far from as 
straightforward as with a traditional library: 


Usually they just give up, it is very frustrating! [...] they 
have e-mailed them about it and they have not got back to 
us! It is frustrating, it is awful, because you have got nowhere 
to go [for help]. At least in the [traditional] library, there might 
be something next to it that might still be relevant. You can 
get round it more they think with traditional things, but when 
it is a digital library, they just feel hopeless ! 


Participants did, however, suggest potential avenues 
for exploration if a document could not be found in a particular 
digital library, by turning to either another digital library or a 
general Internet search engine. For example: 


They had probably try the Internet and that is a huge, it 
is not catalogued or classified, so it is not technically a library, 
but you might find something on the Internet. 


Traditional library users held detailed knowledge of 
library procedures. This allowed them to form well-reasoned 
behavior patterns when they could not find a particular 
document where it was supposed to be on the shelves or 
encountered other problems in the library. In effect, this 
provided users with procedures for troubleshooting and 
maintenance. 


Users recognize that some of this troubleshooting 


Digital Libraries — The Concept Elaborated 125 


knowledge can be provided by librarians, along with other 
“how to use it” and “how it works” knowledge about the library 
and about information seeking strategies in general. Current 
digital libraries do not facilitate troubleshooting of this kind, 
particularly with regard to searching behaviour. This highlights 
the need for digital libraries to facilitate troubleshooting in order 
to avoid users taking potentially inappropriate remedial action 
when things go wrong, such as turning to another digital library 
or a general Internet search engine. 


The study has highlighted areas for attention in digital 
library design if users of those libraries are to develop richer 
mental models that will make them more effective information 
consumers. Users need better support for (a) understanding 
how to formulate effective queries, (b) understanding why 
they have got particular results, (c) assessing document 
relevance, (d) gaining confidence regarding access rights, 
(e) exploring to support the development of useful mental 
models, and (f) making appropriate comparisons between 
different systems— including search engines, e-commerce 
sites and electronic library catalogues. Such advances have 
the potential to transform users’ interactions with digital 
information resources. 


Thus it is seen from above study that traditional libraries, 
or the conventional libraries are not going obsolete in future 
and on many occassions, they have proved themselves so 
important. However, digital libraries are becoming the first 
choice of the users. 


2.6. DIGITAL LIBRARY MYTHS 


There are lot of expectations and confusions about the 
Digital-Libraries. Terry Kuny and Gary Cleveland have tried 
to explore some common myths associated with Digital 
Libraries. 


126 Manual of Digital Libraries 
Myth 1: The Internet is the Digital Library 


Many different groups to signify simply a collection of 
digital objects that people can access from their desktops 
have appropriated the word “library”. A global information 
network, of which the Internet is the seed, has the illusion of 
promising fingertip access to the world’s information. A fairly 
spectacular example of what many people consider to be a 
digital library today is the World Wide Web. But is this a “digital 
library”? For many common library requests, locating 
information on the Internet remains highly inefficient 
compared to traditional library sources, especially for 
unfamiliar users. Finding information is difficult, the quality of 
the information is quite variable, and reliable, professional 
assistance for the confused and lost is lacking. 


Myth 2: The Myth of a Single Digital Library or One-window 
view of Digital Library Collections 


Even modest moves towards increasing digital 
collections and services will be strongly affected by future 
copyright and licensing regimes, as well as prohibitive costs 
for digitization and support of technical infrastructure, but more 
importantly, the digital future will be an unruly one composed 
of multiplicity of competing information providers. Libraries 
will be only one source of information. “Prime” information 
resources will probably be locked into proprietary collections 
essentially “private digital libraries” which are accessible on 
a subscription or pay-per use basis. Developing 
interoperability standards for locating and retrieving 
information in this highly distributed and heterogeneous 
environment will be a considerable challenge in their own right. 


Myth 3: Digital Libraries will provide more Equitable Access, 
Anywhere, Any Time 


A great deal of work must be done to turn this myth into 


Digital Libraries — The Concept Elaborated 127 


reality. We can assume that a global computer network or 
the Internet or some descendant will be the primary delivery 
mechanism for digital information. Equitable access is 
currently compromised by the fact that the Internet is not 
available to every one equally. Furthermore, the connections 
that do exist for most people are slow. For a digital library to 
provide equitable access to information, it is imperative that 
the same universal availability that is a characteristic of the 
communication system is also a characteristic of the network. 
In the future, complex multimedia resources and services may 
have specialized hardware and software requirements such 
that only a limited number of workstations can actually access 
the information. 


Myth 4: Digital Libraries will be Cheaper than Print Libraries 


A common assumption among technology reporters 
about the costs of “digital libraries” is that digital is cheaper 
than paper. This contention is far from established in fact or 
in practice. Although many libraries project savings, especially 
when substitution strategies are used which replace selected 
serials titles with document delivery services, the cost-benefit 
analysis of making this switch remains unclear. Many libraries 
now devote significant resources for hardware and software 
infrastructure. 


There is an increasing unease among members of the 
library community that copyright changes will adversely affect 
the ability of libraries to provide digital collections and services. 
International Publishers Copyright Council, states on digital 
library collections to sense the challenge that librarians face: 


“Many national and regional libraries contemplate 
digitizing their print collections to facilitate a virtual library that 
can provide service to patrons at remote locations and 
facilitate resource-sharing. Such a concept will destroy not 
only the incentive to create new copyrighted works, but the 


128 Manual of Digital Libraries 


revenue from existing works that provides the investment in 
new works by authors and publishers...No longer will libraries 
be the sole repository of published matter. No longer will 
libraries be the only means of obtaining archival information. 
In some areas, libraries will be able to fulfill their function by 
merely pointing to other electronic repositories and in others 
they will seek out more active roles”. 


It is no surprise Digital Content providers are resorting 
to Contract Agreements and Licensing Mechanisms instead 
of normal copyright provisions all over the world. 


Finally think, is a Digital Library possible? With this 
new electronic information infrastructure in place, are libraries 
still needed? Even though many libraries are actively engaged 
in digital library projects, it is not clear what is meant by the 
phrase. It may be the case, in fact, that “digital library” is an 
oxymoron, if we consider a library something more than a 
random selection of documents and objects. 


The convening and social functions of the library 
building are important contributions, but the intellectual 
integrity of collections built and nurtured by knowledgeable 
individuals is a lasting tribute to the scholarly community. This 
is the function that may not be readily accommodated in a 
digital library. 


As librarians try to adapt their contributions to a new 
role in the digital world, many have described the new library 
not as a unified, quasi-comprehensive collection of 
information resources but rather as a gateway to the many 
information resources that are available electronically. 


Richard Rockwell, in addressing a conference, 
“Gateways to Knowledge: the Role of Academic Libraries in 
Teaching, Learning and Research,” at Harvard University, 
defined the gateway library as “an integrated and organized 
means of electronic access to dispersed information 
resources.” The gateway library is not a place but a process 


Digital Libraries — The Concept Elaborated 129 


that delivers services to the user. The digital library, then, is 
a step toward a gateway library but lacks the organization 
and retrieval mechanisms that would ultimately be available. 
As appealing as the concept is, what does this mean for 
traditional research libraries that are, today, the single largest 
asset of most university campuses? 


Digital library projects, sometimes mistakenly called 
digital libraries, are heavily biased toward the public-service 
model. The work under way in most research libraries is 
geared toward establishing linking and pointing mechanisms 
that direct the users to the vast array of electronic resources 
that are available through the Internet. The preponderant 
concerns have been with user-centered search and retrieval 
mechanisms. Many campuses are espousing the language 
used by the University of Michigan to describe digital 
initiatives: “As the computing environment has become more 
distributed, the development of systems to facilitate the 
location and retrieval of digital collections . . . has become a 
priority. The underlying goal is to recognize the rich array of 
individual or unit created digital resources as part of the 
evolving, broad notion of a campus digital information 
environment.” In other words, the gateway function of leading 
to other resources not held by the traditional library has been 
emphasized, but there has been relatively little concentrated 
attention, thus far, on what would be required if a library tried 
to create a digital version of itself that emphasizes coherent 
content. 


Since it is unlikely that librarians will conclude that they 
are obsolete in a digital environment, the number of digital 
projects and digital library creations will simply accelerate. If 
for no other reason, the financial incentives for undertaking 
digital projects are too great to ignore. In addition, the 
prospects for making little-known primary resource materials 
available to entirely new audiences are an attractive 
motivation to think about digital libraries. 


3 
Digital Library 
Components and 
Services 


A digital library is the way a traditional library uses 


technology and innovation to meet customer demand, and 
that the common thread is the “use of technology to improve 
customer delivery, as well as the search and retrieval 
technology available on electronic networks”. There are five 
common elements for digital libraries; they are as following: 


The digital library is not a single entity; 


The digital library requires technology to link with the 
resources of the many; 


The linkages between the many digital libraries and 
information services are transparent to the end users; 


Universal access to digital libraries and information 
services is a goal; and, 


Digital library collections are not limited to document 
surrogates— they extend to digital artefacts that cannot 
be represented or distributed in printed formats. 


The digital library may also be called the library without 


walls, virtual library, electronic library, e-library, desktop 
library, online library, future library, library of the future, logical 


Digital Libraries Components and Services 131 


library, networked library, hybrid library, gateway library, 
extended library or information superhighway. Of these many 
terms, virtual library, digital library, hybrid library and electronic 
(or e-) library are most common used terms. 


Digital libraries may be made up of a number of 
components, Including—the Internet and Intranets; integrated 
access to information; digitization of materials; access to 
electronic publications; electronic document delivery; 
resource sharing; cooperative developments; and, end-user 
services. These elements, then, may be considered the basic 
building blocks of the virtual library, although the nature and 
extent of the application of each component will depend upon 
the circumstances and needs of the library and/or organization 
to which the library is attached. 


The development of digital libraries appears to be 
heavily dependent on a number of inter-related enabling or 
hindering factors, comprising of legal issues; financial issues; 
client issues; personnel issues; organisational issues; 
management issues; technological issues; collaboration 
issues; and, subject discipline issues. Although writers in the 
field differ on the importance of the individual issues, it is 
widely agreed that continued progress and ultimate success 
of digital libraries is dependent on their resolution. If this is 
the case, then these issues make up an environment that 
surrounds the digital library itself. This may be modelled as 
shown in Figure 3.1. 


3.1. COMPONENTS OF DIGITAL LIBRARY 
The digital libraries are complex systems that — 
— help to satisfy information needs of users (societies), 
— provide information services (scenarios), 


— locate and present information in usable way (Spaces), 


432 Manual of Digital Libraries 


Intagrand 
ead . Ascens ta 
erates Inkurmed ion 
Digjilisatian of Eleansic 
Materink Pablications 


Elecresic 
Document 
hery 


lisrery | Cih Uver 
Sr 4 


Cirganrscationed 
lens 


Perucamel 
kns 


Pa 


Callstordton 
bras. 


Fig. 3.1. Digital Libraries and their Environment 
— organize information in usable ways (structures), and 


— communicate information with users and computers 
(streams). 


We must ensure above 5s, i.e., societies, scenario, 
spaces, structures and streams for digital library making. But 
the actual content of digital libraries is made up of a number 
of digital objects. In some cases these may be thought of as 
data sets, e.g., a table of results, the genomic information for 
an individual. In others they may be multimedia information, 
such as an image, graphic animation, sound, musical 

performance, or video. Many can be thought of as documents, 


Digital Libraries Components and Services 133 


which carry content in some structure or structures, perhaps 
made up of logical or physical divisions such as sections or 
pages. Thus, into the foreseeable future, digital libraries will 
be hybrid constructs, where paper, microforms, CDs, DVDs 
and other media carry much of the content. 


Establishing digital library resources and services 
require a great deal of new infrastructural components that 
are not available off-the-shelf as packaged solutions. There 
are no turnkey, monolithic systems available for digital 
libraries, instead digital libraries are collections of disparate 
systems and resources connected through a network, and 
integrated within one interface, currently the web interface. 
Use of open architecture and standard protocols, however, 
make it possible that pieces of required infrastructure, be it 
hardware, software or accessories, are gathered from 
different vendors in the marketplace and integrated to 
construct a working environment. several components 
required for establishing digital libraries would be internal to 
the institutions, but several others would be distributed across 
the Internet, owned and controlled by a large number of 
independent players. The task of building a digital library, 
therefore, requires a great deal of integration of various 
components. Components required for a digital library can 
broadly be divided into the following five categories and are 
depicted in Fig. 3.2. 


e Collection Infrastructure 

e Access Infrastructure 

e Computer and Network Infrastructure 
e Digital Resource Organization 


e Manpower Training 


3.1.1. Collection Infrastructure 


The most important component of a digital library is the 


134 Manual of Digital Libraries 


Push Technology 


> Alerting Services 


> My Profile 
Springer Link Alert 
Content Direct 


>My Journal 


Pull Technology 

>WebPAC 

» Virtual Library Tours 

> Library Web Sites 

» Library Portals 

> Digital Reference 
Service 

> Real-time Digital 
Reference Service 

» Web-based User 
Education 

FAQ 

> Library Calendar 

> Web Forms 

> Discussion 

> Forum and Listservs 


‘Di gital Resource 
C Organization 


Fig. 3.2. Components of Digital Library 


digital collection it holds or has access to. Viability and extent 
of usefulness of a digital library would depend upon the critical 
mass of digital collection it has. The collection infrastructure, 
typically consists of two components, i.e. metadata and digital 
objects. The metadata provides bibliographic or index 
information for the digital objects. While digital objects are 
the primary documents that users are interested to access. It 
is metadata that facilitates their identification and location 
using a variety of search techniques. Information contents of 
a digital library, depending on the media type it contains, may 
include a combination of structured / unstructured text, 
numerical data, scanned images, graphics, audio and video 
recordings. Different types of resources need to be handled 
differently in a digital library. Rusbridge has divided resources 
for a digital library in four distinct categories, /e. legacy, 
transition, new and future. 


Legacy resources are largely non-digital resources, 


Digital Libraries Components and Services 135 


including manuscripts, prints, slides, maps, audio and video 
recordings. In spite of the fact that large investments are being 
made in the process of digitization of resources, the vast 
majority of existing legacy resources will remain outside the 
electronic domain for many years to come. These legacy 
resources are the major resources of existing libraries. 
Transition resources, primarily designed for another medium 
(mostly print), are those which are being or have been 
digitized, making the transition into the digital world. Such 
resources are converted for increased access and to reduce 
reliance on physical libraries. The transition resources are 
either digitized images or images that are converted to text 
by the process of Optical Character Recognition (OCR). New 
digital resources are either expressly created as digital or 
are created in parallel to print. Publishers are increasingly 
moving to XML or SGML formats. These formats are used to 
generate datafiles required for producing printed outputs. The 
SGML / XML databases are also used for generating HTML/ 
PDF / XHTML or PostScript file dynamically using appropriate 
DTDs. New digital resources are designed with a particular 
use in mind employing new Internet and web technologies 
embodying a great variation and value addition. There is an 
increasingly wide range of digital resources from formally 
published electronic journals and electronic books through 
databases and datasets in many formats, i.e. bibliographic, 
full-text, image, audio, video, statistical and numeric datasets. 
Future resources may contain data sets which are not 
formally specified. 


There is an increasingly wide range of digital resources, 
from formally published electronic journals and electronic 
books, through database and datasets in many formats like 
bibliographic, full text, image, vector / map, audio / video, 
statistical and numeric datasets. The object-oriented world 
of digital objects, packaging the data resources and the 
access or processing method, particularly where machine 


136 Manual of Digital Libraries 


independent code is used for the access methods holds out 
the best hope for resources of the future. 


A digital library is not a single entity, although it may 
have digital contents created in-house or acquired in digital 
formats stored locally on servers. A digital library may also 
act as portal site providing access to digital collections held 
elsewhere. The digital constituents of a digital library are 


shown in Fig. 3.3. 
Acquiring Digital Buying Access of Content Creation 
Media Resources "Born Digital” 
Digital Library 


Conversion Portal Sites Integrated 
Print to Digital Access 


Fig. 3.3. Digital Constituents of Digital Library 


3.1.2. Principles for Good Digital Collections 


Just as a library collection is more than a random 
assemblage of books and journals and a museum collection 
is more than a random assemblage of artifacts or specimens, 
a digital collection of information resources is more than a 
random assemblage of digital objects. Collections imply 
selection and organization. Collections typically also require 
descriptive, structural, and/ or administrative context, typically 
in the form of metadata, usually at both the collection level 
and the item or object level. The framework principles for good 
digital collections derive from this understanding of the nature 
of collections. They specify what is most often necessary to 


Digital Libraries Components and Services 137 


create a good digital collection but are not prescriptive about 
how such specifications must or should be satisfied. 


A good digital collection should bear following features: 


— A good digital collection should be created according 
to an explicit collection-development policy that has 
been agreed upon and documented before digitization 
begins. 


— Collections should be described so that a user can 
discover important characteristics of the collection, 
including scope, format, restrictions on access, 
ownership, and any information significant for 
determining the collection’s authenticity, integrity, and 
interpretation. 


— A collection should be sustainable over time. In 
particular, digital collections built with special funding 
should have a plan for their continued usability beyond 
the funded period. 


— Agood collection should broadly be available and avoid 
unnecessary impediments to use. Collections should 
be accessible to persons with disabilities and usable 
effectively in conjunction with adaptive technologies. 


— A good collection should respect intellectual property 
rights. Collection managers should maintain a 
consistent record of rights holders and permissions 
granted for all applicable materials, 


— A good collection should provide some measurement 
of use. Counts should be aggregated by period and 
maintained over the time so that comparisons can be 
made. 


— A good collection should fit into the larger context of 
significant related national and international digital 
library initiatives. 


138 Manual of Digital Libraries 


So a digital collection should possess above features. 


3.1.3. Acquisition of Collections available in Digital Formats 


The availability of CD-ROM, and more recently DVD- 
ROM, as a media with high-storage capacity, longevity and 
ease of transportation, triggered production of several CD- 
ROM-based information products including several 
bibliographic databases that were earlier available only 
through online vendors or as abstracting and indexing 
services in printed format. Thousands of CD-ROM databases 
are currently available from a multitude of CD-ROM producers 
including Silver Platter which alone produces more than 250 
CD-ROM information products. Moreover, several full-text 
databases also started appearing in the late 1980s and early 
1990s launching the beginning of a new digital era. Some of 
the important full-text digital collections available on CD-ROM 
include: ADONIS, IEEE / IEE Electronic Library (IEL), ABI/ 
INFO, UMI’s International Business Database, UMI’s General 
Reference Periodicals, Espace World, US Patents, etc. CD- 
ROM networking technology is now available for providing 
web-based simultaneous access to CD-ROM databases on 
the Local Area Network (LAN) as well as on Wide Area 
Network (WAN). More evolved technology allows catching 
of the contents of CD-ROMs on to a server, which, in turn, 
provides web-based simultaneous and faster access to the 
information contents of CD-ROMs. The libraries have an 
option to subscribe to these full-text databases as a part of 
their digital library. The Silver Platter’s Electronic Reference 
Library (ERL) technology facilitates uploading of contents of 
ERL-complaint CD-ROM databases on the hard disc of an 
Intranet server, which, in turn, provides integrated access and 
search on ERL-complaint databases through an Intranet 
server. Moreover, individual research articles in the ERL- 
complaint database are linked to their full-text research 
articles using Silver Platter’s Silver Linker. 


Digital Libraries Components and Services 139 
3.1.4. Buying Access to External Digital Collections 


The libraries will not become digital libraries, but will 
rather acquire access to ever growing digital collections on 
behalf of their users. Majority of these collections are be 
provided by external sources like commercial publishers, 
collections mounted by scholarly societies, resources at other 
libraries, electronic journal sites, etc. Internet has long been 
a favorite media for experimenting with electronic publishing 
and delivery. 


The technology is now available for creation of fully 
digitized multimedia products and make them accessible 
through the internet. Technological changes, especially the 
internet and web technology, continue to attract more and 
more traditional players to adopt it as a global way to offer 
their publications to the international community of scientists 
and technologists. Most of the important publishers now have 
their web-based interfaces to offer full texts of their journals. 


The current electronic publishing market consists of 
traditional players offering electronic versions of their print 
journals as well as several new enterprises offering new 
products and services that are ‘borne digital’. The market also 
has several subscription agents in their new role as 
aggregators. These players include: 


Publishers and Scholarly Societies : Most well-known 
commercial publishers of traditional journals such as Elsevier 
Science, Kluwer Academic Press, Academic Press, Springer 
Verlag, Wiley InterScience and scholarly societies such as 
SIAM, ACM, IEEE/IEE are making their publications available 
online through their web sites. Several universities host 
specialised collections on their web sites. Several universities, 
as members of the networked digital library of theses and 
dissertations (NDLTD) initiative, host doctoral dissertations 
submitted to their respective universities. 


140 Manual of Digital Libraries 


Aggregators : Third party aggregators provide access 
to numerous journals from a variety of publishers. Aggregators 
include organizations like JSTOR that offer extensive backfiles 
for more than 100 academic journals and OCLC Electronic 
Collection Online which offer full text access to more than 
two thousand titles via their First Search service. Other 
aggregators like Lexis-Nexis, Bell and Howell (UMI) and Web 
of Science (ISI) offer searchable indexes with links to full text 
journals on publisher’s site, EBSCOHost, IAC Trac 
SearchBank and Blackwell’s Electronic Journal Navigator 
(EJN) provide common search interface for the journals 
aggregated by them from assortment of publishers. Besides, 
many growing number of subscription agents are working with 
publishers to provide aggregated services for packages of 
titles or for full text databases. 


Pricing Model : It is seen that total number of electronic 
journals available on the web has grown steadily from less 
than 10 in 1989 to 8500 in April, 2000. Here one of the major 
issue that the publishers are concerned with, is to save their 
economic interest in the process of providing electronic 
access to their printed publications. The publishers make a 
significant investment in the process of production of a journal 
which involves activities like peer-review, administration, 
editing, layout design, production, subscription management 
and distribution. Most activities that are performed for 
publishing a journal are common to both electronic and paper 
media, except for production and distribution where the cost 
involved is relatively low. Moreover, electronic version of 
journals generally provides additional features like link to 
corrections, link to additional materials, e-mail link to author(s), 
etc., which require additional work on part of the publisher. 
Tenopir and King in a study concluded that the cost of 
electronic journals can not be substantially lower than their 
printed versions. 


Digital Libraries Components and Services 141 


Journals are made available through the web at varying 
price models. In a survey of 8001 peer reviewed electronic 
journals conducted by EBSCO, it was found that 50 percent 
of electronic journals are free with their print journals, 34 
percent require additional payment over their print 
subscription and 16 percent are available online only without 
their print counter-part. Overall, 84 percent of journals require 
a print subscription to journals as a prerequisite for online 
access to their electronic version. Some prevalent pricing 
models are as under: 


Electronic Subscription linked to Print Subscription : 
The electronic subscription to journals in most of the cases 
are linked to their printed counterparts, i.e., it may be offered 
free with print subscription (e.g., publications of American 
Society for Physics and ASCE) or priced at a fixed percentage 
over the print subscriptions (e.g., IEEE’s ASPP package). 


Electronic Subscription with Campus Licenses : 
Electronic publishers facilitate campus wide unlimited access 
to subscribed journals on payment of a fixed amount of 
platform fee. Example: Elsevier Science (ScienceDirect). 


Electronic Subscriptions are Bundled : Several 
electronic publishers offer access to the entire range of their 
electronic journals and other publications bundled into one. 
For example IEL and ACM Digital Library offer access to their 
entire site on subscription. Access to individual journals or a 
subset is not permissible. Similarly, Academic Press offer all 
journals available on their site (Academic’s Project IDEAL) 
for 10 percent more than the print subscription to library 
consortia. 


Pay-per-Look : Publishers and aggregators have 
started experimenting with models wherein a user can search 
a database online for a modest usage fee, identify articles of 
interest, and then call up such articles in full text on a per- 
look basis. 


142 Manual of Digital Libraries 


Electronic Only : A few publishers and aggregators 
have started offering only electronic version of their journals 
providing a modest discount for those who forego print 
versions. 


Consortium Licensing : Consortia provide union 
strength to negotiate with electronic publishers for the best 
possible price and rights. Most publishers already have well- 
defined policies and offer for libraries subscribing as consortia. 
The consortia licensing is widely used the world over by the 
libraries. It is slowly picking-up in India also. 


National Licensing : National licenses can also be 
negotiated with electronic publishers for core collections. 
Singapore, Taiwan and UK have arranged national licenses 
for some of the important full text resources. 


Besides, electronic journals, there are several online 
databases that are now available through the web including 
` MEDLINE (several versions), AGR/COLA and ERIC (all free). 
Most online search services like STN and DIALOG also have 
their web-based interfaces. Reference works like 
encyclopedia, dictionaries, handbooks, atlases, etc., are also 
making their electronic appearance on the web. Electronic 
resources created exclusively for the web include web-based 
educational tutorials called ‘online courseware’. 


The online courseware are proliferating the web as a 
strong contender for distant education. Telecampus, Canada 
(www.telecampus.edu/) lists more than 12,000 online 
courseware available on the web. Moreover, highly 
specialized web sites are now coming-up in various disciplines 
which offer information in totality including all kinds of 
resources in electronic format. El Engineering Village (http: 
/Iwww.ei.org/), ISI Electronic Library (http://www.isinet.com), 
JEL (http://www.ieee.org/), Engineering Sciences Data Unit 
(http://www.esdu.com) are some of the important examples. 


Digital Libraries Components and Services 143 


Electronic resources accessible on the web for free or 
for a fee are undeniably major and important constituent of a 
digital library. 


3.1.6. Converting Digital Datasets that are Born Digital 


The libraries or the institutions implementing digital 
libraries may have datasets that are originally created in digital 
format. Doctoral dissertations submitted to universities and 
research institutions are undisputedly highly valuable 
documents that qualify to be an important component of any 
digital library. Moreover, institutions may have in-house 
journals, annual reports, technical reports, or other datasets, 
that may be included in digital collection. Items listed here 
are invariably composed in one of .the word processing 
programme or in a desk-top publishing package. 


The documents composed on word processing 
packages or desktop publishing packages can be converted 
into HTML, PostScript and PDF using tools like Acrobat 4.0 
or Acrobat Exchange. Online converters are also available 
through Adobe’s site. HTML, as a defacto language of the 
web and PDF as a defacto standard for online distribution of 
electronic information, can be employed to facilitate transition 
from computer processible files to a format accessible on the 
web. 


For regular publications, the institutions may adopt 
SGML or XML to provide structure and functionality to their 
publications. Most electronic publishers are increasingly using 
SGML/XML to ripe the benefit that the format offer. SGML or 
XML documents provide benefit of a database management 
system without being same. Publishers code the accepted 
submissions in SGML/XML in a semi-automated process 
using assortment of software packages available to them or 
using custom-made software specially designed for this 
purpose. The database of SGML documents are used for 


144 Manual of Digital Libraries 


providing search by authors, keywords, etc., and browse the 
content pages of journals. Behind the web interface lies a 
relational database like Oracle that store SGML/XML 
documents. Search and browse operation on highly structured 
XML/SGML datasets provides dynamically generated web 
pages (HTML-on-fly). These HTML files provide link to full 
text of documents in HTML/PDF/PostScript, all formats are 
generated dynamically from the same SGML/XML datasets 
using pre-defined DTDs. 


3.1.7. Conversion of Existing Print Media into Digital Format 


Several digital library projects are concerned with 
providing digital access to materials that already exist with 
traditional libraries in printed media. Scanned page images 
are practically the only reasonable solutions for institutions 
such as libraries for converting existing paper collection or 
legacy documents without having access to the original data 
in computer processible formats convertible into HTML/SGML 
Or in any other structured or unstructured text. Scanned 
images are natural choice of large-scale conversions for major 
digital library initiatives. Printed text, pictures and figures are 
transformed into computer-processible forms using a digital 
scanner or a digital camera in a process called document 
imaging or scanning. The digitally scanned images are stored 
in a file as a bit-mapped page image, irrespective of the fact 
that a scanned page may contain a photograph, a line drawing 
or text. A bit-mapped page image is a type of computer 
graphic, literally an electronic picture of the page which can 
most easily be equated to a facsimile image of the page and 
as such they can be read by humans, but not by the 
computers, i.e., ‘text’ in a page image is not searchable on a 
computer using the present-day technology. An image-based 
implementation require a.large space for data storage and 
transmission. There are several large projects using page 
images as their primary storage format, including project 


Digital Libraries Components and Services 145 


JSTOR (www. jstor.org) at Princeton University, USA funded 
by the Melon Foundation. The project JSTOR has a complete 
set of more than 120 journals scanned and hosted on web 
servers that reside at the University of Michigan and is 
mirrored at Princeton University, USA. Using technology 
developed at Michigan, high resolution bit-mapped images 
of each page are linked to a text file generated with OCR 
software. Linking a searchable text file to the page images of 
the entire published record of a journal along with newly 
constructed table of contents, indexes, permits high level of 
access, search and retrieval of the journal content previously 
unimaginable. 


Capturing page image format is comparatively easy and 
inexpensive, it is reproduction of a page maintaining page 
integrity and originality. The scanned textual images, 
however, are not searchable unless it is OCRed, which, in 
itself, is highly error prone process specially when it involves 
scientific texts. 


3.1.8. Print to Digital : Options for Conversion 


The digital imaging technology offers a number choices 
that can be adopted depending on the objective of scanning, 
end users, and availability of finances, etc. There are four 
basic approaches that can be adapted to translate from print 
to digital: 


e Scanned as image, 

e OCR, 

e Retaining page layout using acrobat capture, and 
e Re-keying the data. 


Scanned as Image : Image only is the lowest cost 
option in which each page is an exact replica or the original 
source document. Since OCR is not carried out, the document 
is not searchable. Most scanning software generate TIFF 


146 Manual of Digital Libraries 


format, by default, which can be converted in to PDF using a 
number of software tools. Scan to TIFF/PDF format is 
recommended only when the requirement of project is to make 
documents portable and accessible from any computing 
platform. The images can be browsed through a table of 
contents composed in HTML providing link to scanned 
images. 

OCR Programs : OCR or Optical Character 
Recognition programs are software tools used to transform 
scanned textual page images into the word processing file. 
OCR or text recognition is the process of electronically 
identifying text in a bit-mapped page image or set of images 
and generate a file containing that text in ASCII code or in a 
specified word processing format leaving the image intact in 
the process. The OCR is performed in order to make every 
word in a scanned document readable and fully searchable 
without having to key-in everything in the computer manually. 
Once a bit-mapped page image has gone through the process 
of OCR, a document can be manipulated and managed by 
its contents. 


Retaining Layout after OCR : Several software 
packages now offer facility of retaining the page layout after 
it has been OCRed. The process for retaining the page layout 
is software dependent. Caere’s Omnipro offers two ways of 
retaining page layout following OCR. It calls them True Page 
Classic and True Page Easy. True Page Classic places each 
paragraph within a separate frame of a word processor into 
which the OCR output is saved. If one wishes to edit anything 
subsequently, then the relevant paragraph box may need to 
be resized. However, ‘Easy Edit’ facilitates editing of pages 
without the necessity of resizing the boxes although there is 
a greater chances of spillage over the page. Xerox Text Bridge 
offers similar feature called DocuRT which is broadly 
equivalent to True Page Easy Edit. The process of OCR 

dismantle the page, OCR it, and then reassemble it in such 


Digital Libraries Components and Services 147- 


as way that all the component parts such as tabs, columns, 


table, graphics can be used in a text manipulation package 
such as word processor. 


Scanner 


Document image 


softwarehbardware 


Costextval 
Processor 


Recognition results 


Output interface 


To application 


Fig. 3.4. Examples of OCR Technologies 


The process of OCR results in computer-processible 
file that is less accurate than rekeying-in the data. At an 
accuracy ratio of 98 percent, a page having 1800 characters 
will have 36 error per page on an average. It is therefore, 
imperative to cleanup after OCR unless original scanned 
image will be viewed as page and OCR being used purely for 
creating a searchable index on the words that will be searched 


148 Manual of Digital Libraries 


via a fuzzy retrieval engine like Excalibur which is highly 
tolerant to OCR errors. 


Another possibility for cleaned-up OCR is use of 
specialist OCR system such as Prime Recognition. With 
production OCR in mind, Prime OCR licenses leading 
recognizing engine and passes the data through several of 
them using voting technology along with artificial intelligent 
algorithms. The use of multiple OCR engines improves the 
result achieved by a single engine by 65-80 percent. The 
technology is available at price depending upon number of 
search engine that one would like to incorporate. Michigan 
Digital Library production services used Prime OCR for 
placing more than two million pages of S@ML-encoded text 
and the same number of page images on the web. 


Retaining Page Layout using Acrobat Capture : The 
Acrobat Capture 2.0 provides several options for retaining 
not only the page layout but also the fonts, and to fit text into 
the exact space occupied in the original, so that the scanned 
and OCRed copy never over- or under-shoots the page. It 
treats unrecognizable text as pasted-in images, which are 
perfectly readable by anyone looking at the PDF file, but which 
will be absent from the editable and searchable text file. In 
contrast, ordinary OCR programs treat unrecognized text as 
tildes or some other special character in the ASCII output. 
Acrobat Capture can be used to scan pages as images, 
image+text and as normal PDF, all the three options retain 
page layout. 


A OCRed text is generated in image+text option, for 
each image where each page is an exact replica of the original 
and left untouched, however, the OCRed text sits behind the 
image and is used for searching. The OCRed text is generally 
not corrected for errors since it is used only for searching. 
The cost involved is much less than PDF Normal. However, 
the entire page is a bitmap and neither fonts nor line drawings 


Digital Libraries Components and Services 149 


are, vectorized, so the file size of Image + Text PDFs is 
considerably larger than the corresponding PDF Normal files 
and pages will not display as quickly or cleanly on screen. 


PDF normal gives the clearest on-screen display, which 
is searchable, and yet with significantly smaller file size than 
Image+Text. The result is not, however, an exact replica of 
the scanned page. While all graphics and formatting are 
preserved, substitute fonts may be used where direct matches 
are not possible. It is a good choice when files need to be 
posted to the web or otherwise delivered online. If, during the 
capture and OCR process, a word cannot be recognized to 
the specified confidence level, capture, by default, substitutes 
it with a small portion of the original bitmap image. Capture’s 
‘best guess of the suspect word lies behind the bitmap so 
that searching and indexing are still possible. However, one 
cannot guarantee. that these bit-mapped words are correctly 
guessed. In addition, the bit-map is somewhat obtrusive, 
detracting from the look of the page. Further, capture provides 
option to correct suspected errors left as bit-mapped image 
or leave them untouched. 


Re-keying : A classic solution of this kind would 
comprise of keying-in the data and its verification. This 
involves a complete keying of the text, followed by a full 
rekeying by a different operator, the two keying-in operation 
might take place simultaneously. The two keyed-in files are 
compared and any errors or inconsistencies are corrected. 
This would guarantee at least 99.9 percent accuracy, but to 
reach 99.95 percent accuracy level, it would normally require 
full proof-reading of the keyed-in files, plus table lookups and 
dictionary spellchecks. 


3.1.9. Steps in the Process of Digitization — The 
following four steps are involved in the process of digitization— 
scanning, indexing, storage, and retrieval. Software, variably 
called document image processing (DIP), electronic filing 


150 Manual of Digital Libraries 


system (EFS) and document management systems (DMS) 
provides all or more of these functions. 


Scanning : Scanning process involves acquisition of 
an electronic image through its original that may be a 
photograph, text, or manuscript, etc. into the computer using 
an electronic image scanner. An image is read or scanned at 
a predefined resolution and dynamic range The resulting file, 
called ‘bit-map page image’ is formatted and tagged for 
storage and subsequent retrieval by the software package 
used for scanning. Fax card, electronic camera or other 
imaging devices may also be employed for acquisition of an 
image, however, image scanners are most important and 
most commonly used component of an imaging system for 
transfer of normal paper-based documents into an image. 


Indexing : Indexing of a document converted into an 
image or text file is the second step in the process of document 
imaging. The process of indexing scanned image involves 
linking of database of scanned images to a text database. 
Scanned images are just like a set of pictures that need to be 
related to a text database describing them and their contents. 
An imaging system typically stores a large amount of 
unstructured data in a two file system for storing and retrieving 
scanned images. The first is traditional file that has a text 
description of the image along with a key to a second file. 
The second file contains the document location. The user 
selects a record from the first file using a search algorithm. 
Once the user selects a record, the application keys into the 
location index, finds the document and displays it. 


Most of the document imaging software packages, 
through their menu driven or command driven interface, 
facilitate elaborate indexing of documents. While some DMS 
facilitate selection of indexing terms from the image file, others 
allow only manual keying in of indexing terms. Further, many 
DMS packages provides OCR capabilities for transforming 


Digital Libraries Components and Services 151 


the images into standard ASCII files. The OCRed text then 
serves as a database for full text search of the stored images. 


Storage : The most tenacious problem of a document 
image relates to its file size and, therefore, to its storage. 
Every part of an electronic page image is saved regardless 
of presence or absence of ink. The file size varies directly 
with scanning resolution, the size of the area being digitized, 
compression ratio, content and the style of graphic file format 
used to save the image. The scanned images, therefore, need 
to be transferred from the hard disc of scanning workstation 
to an external large capacity storage devices such as an 
optical disc, CD-ROM/DVD-ROM disc, cartridge tapes, etc. 
While the smaller document imaging system use offline 
media, which need to be reloaded when required, or fixed 
hard disc drives allocated for image storage larger DMS use 
auto-changers such as optical jukeboxes and tape library 
systems. The image storage device may be either remote or 
local to the retrieval workstation depending upon the imaging 
systems and DMS used. 


Retrieval : Once scanned images and OCRed text 
documents have been saved as a file, a database is needed 
for selective retrieval of data contained in one or more fields 
within each record in the database. Typically, a document 
imaging system uses at least two files to store and retrieve 
documents. The first is traditional file that has a text description 
of the image along with a key to the second file. The second 
file contains the document location. The user selects a record 
from the first-file using a search algorithm. Once the user 
selects a record, the application keys into the location index, 
finds the document and displays it. Most of the DMS provide 
elaborate search possibilities including use of Boolean and 
proximity operators (and, or, not) and wild cards. Users are 
also allowed to refine their search strategy. Once the required 
images have been identified, their associated document 
image can quickly be retrieved from the image storage device 


152 Manual of Digital Libraries 
for display or printed output. 


The digitally scanned images are stored in a file as a 
bit-mapped page image, irrespective of the fact that a scanned 
page contains a photograph, a line drawing or text. The 
scanned image can be formatted and tagged in dozens of 
different formats to facilitate easy storage and retrieval 
depending upon the scanner and its software. National and 
international standards for image-file formats and 
compression methods exist to ensure that data will be 
interchangeable amongst systems. An image file stores 
discrete sets of data and information allowing a computing 
system to display, interpret and print the image in a pre- 
defined format. An image file format consists of three distinct 
components, i.e., header which stores information on file 
identifier and image specifications such as its size, resolution, 
compression protocols, etc.; image data consisting of look- 
up table and image raster; and lastly footer that signals file 
termination information. While bit-mapped portion of a raster 
image is standardized, it is the file header that differentiate 
one format from another. The display software of a raster 
image picks up the details, like resolution, compression 
technique, etc., from the file header and displays an electronic 
replica of the original page. File formats also define the 
compression protocol used for compressing or 
decompressing an image. 


Different file formats are used to store different media 
types like text, images, graphics, pictures, musical works, 
computer programs, databases, models and designs, video 
programs and compound works combining many types of 
information. Although almost every type of information can 
be represented in digital form, a few important file formats for 
text and images typically applicable to a library-based digital 
library are described below. However, every object in a digital 
library needs to have a name or identifier which distinctly 
identifies its type and format. This is achieved by assigning 


Digital Libraries Components and Services 153 


file extensions to the digital objects. The file extensions in a 
digital library typically denotes formats, protocols and rights 
management that are appropriate for the type of material. 
Names of file formats applicable in digital libraries and their 
file extensions are given in Table 3.1. 


3.1.10. Formats and Encoding Used for Text 


Text-based contents of a digital library can be stored 
and presented as— (i) simple text or ASCII (ii) unstructured 
text; and (iii) structured text (SGML or HTML or XML). 


Simple Text or ASCII : Simple text or ASCII (American 
Standard Code for Information Exchange) is the most 
commonly used encoding scheme for facilitating exchange 
of data from one software to another or from one platform to 
another. “Full-text” of articles from many journals has been 
available electronically through online vendors like DIALOG 
and STN in this format for over two decades. Typically what 
is stored in the text of each article, broken into paragraphs, 
along with bibliographic information is a simple tagged 
information. 


Simple text or ASCII is compact, economic to capture 
and store, searchable, inter-operable and is malleable with 
other text-based services. But the simple text or ASCII cannot 
be used for displaying complex tables or mathematical 
formulas. Photographs, diagrams, graphics, special 
characters also cannot be displayed in ASCII. ASCII format 
does not store text formatting information, i.e., italics, bold, 
font type, font size or paragraph justification information. Thus, 
simple text or ASCII in many ways is inadequate to represent 
many journal articles because of the mentioned reasons. 
Although simple text or ASCII is extremely useful for searching 
and selection, its inability to capture the richness of the original 
makes it, an interim step to structured text formats. 


Structured Text Format : Structured text attempt to 


154 Manual of Digital Libraries 


capture the essence of documents by “marking-up” the text 
so that the original form could be recreated or even produce 
other forms such as ASCII. Structured text format have 
provision to imbed images, graphics and other multimedia 
formats in the text. SGML (Standard Generalized Markup 
Language) is one of the most important and popular structured 
text format. ODA (Office Document Architecture) is a similar 
and competing standard. SGML is an international standard 
around which several related standards are built. SGML is 
flexible language that gave birth to HTML (Hyper-text Markup 
Language), de facto markup language of the World Wide 
Web, to control the display format of documents and even 
the appearance of the user interface for interacting with the 
documents. Like simple text or ASCII, structured text can be 
searched or manipulated. It is highly flexible and suitable both 
for electronic and paper production. Well-formated text 
increases visual presentation of textual, graphical and pictorial 
value of information. Structured formats can easily display 
complex tables and equations. Moreover, the structured text 
is compact in comparison to the image-based formats, even 
after including imbedded graphics and pictures. 


The creation of structured text, if rekeyed, is always 
too expensive to do on a production basis. However, creation 
of structured text is generally integrated with the production 
of printed artifacts. SGML is generally a format generated as 
a by-product of printed artifacts generated electronically. 


Besides SGML and HTML, there are other formats used 
in digital library implementation. TeX, used for formatting 
highly mathematical text is one such format which allows 
greater control over the resulting display of document, 
including reviewing the formatting of errors. 


Page Description Language (PDL) : Page description 
Language (PDLs), such as Adobe’s PostScript and PDF 
(Portable Document Format) are similar to image but the 


Digital Libraries Components and Services 155 


formatted pages displayed to the user are text-based rather 
than image-based. Acrobat’s Portable Document Format 
(PDF), is a by-product of PostScript, Adobe’s page-description 
language. While PostScript is a programming language, PDF 
is a page-description format. Portable Document Format 
(PDF) can have two formats- (i) Text-based PDF that uses 
outline font technology of PostScript PDL (Page Description 
Language) from Adobe to describe format of a page; (ii) raster- 
scanned image PDF without the text output of OCR (Optical 
Character Recognition). The image PDF is essentially 
equivalent TIFF or CCITT G4 formats or to a photograph 
where text characters cannot be manipulated by the 
computer. Besides, an image PDF may also have hidden text 
from OCR output or text-based PDF generated from an 
image-PDF after it goes through the process of OCR and 
scanned image is replaced by the text with fonts and layout 
matching with the scanned document. 


In contrast to PDF, PostScript is a powerful and dynamic 
programming language that allows a tremendous range of 
interpretation in various applications and providing the ability 
to produce visually identical pages by any number of means. 
However, a PostScript file must be read from beginning to 
end, because some programming expression in page one, 
declaring what fonts are being used, for example, can affect 
page 15. You can not just extract page 15 and expect it to 
stand-alone. 


PDF, on the other hand, as a page description format, 
is page independent. Every page has all the information 
needed to display that page. Because it does not allow all the 
complex computational alternatives of PostScript, PDF code 
is much more consistent and predictable, whereas the 
PostScript description of a page will be different depending 
on the software package which is used originally to create 
the page. 


156 Manual of Digital Libraries 


" The reason for that consistency is that PDF is created 
by a PostScript interpreter called the Distiller, which converts 
the PostScript produced by an application program into a very 
standard PDF description of the resulting page. The resulting 
PDF retains all the information about what that page is 
supposed to look like. In addition to fonts, PDF represents all 
the other visual aspects of the page—line breaks, layout, white 
space, graphics, colours — every visual feature of the page. 
And because it is a vector technology rather than the bit-map 
associated with scanned images, the resulting files are very 
compact and yet the fonts and graphics adapt to the resolution 
of the output device. 


3.1.11. Converting PostScript files to PDF File 


The PostScript print file can be converted into PDF file 
using Adobe Acrobat. Adobe Acrobat can be insialled in a 
computer in such a way that Acrobat appears to be a printer. 
By selecting Acrobat as the printer, the PostScript PDL print 
file is automatically sent to Acrobat, which converts the 
PostScript file to a PDF file. This initial PDF file is an outline 
font based file. The outline font file can be converted to a 
raster file by using an Adobe Acrobat ‘print-to-image’ 
command to print the PDF as a raster image or by opening 
the PDF file in Adobe PhotoShop and exporting the image as 
a raster image. 


Page Image Format : The digitally scanned images 
are stored in a file as a bit-mapped page image, irrespective 
of the fact that a scanned page contains a photograph, a line 
drawing or text. The bit-mapped page image can be created 
in dozens of different formats depending upon the scanner 
and its software. National and international standards for 
image-file formats and compression methods exist to ensure 
that data will be interchangeable amongst systems. An image 
file stores discrete sets of data and information allowing a 
computing system to display, interpret and print the image in 


Digital Libraries Components and Services 157 


a pre-defined fashion. An image file format consists of three 
district components, i.e. header which stores information on 
file identifier and image specifications; Image data consisting 
of look-up table and image raster and lastly footer that signals 
file termination information. While bitmapped portion of a 
raster image is standardized, it is the file headers that 
differentiate one format from another. 


TIFF (Tagged Image File Format) is the most commonly 
used page image file format and it is considered to be the de 
facto standard for bitonal images. Some image formats 
developed by commercial vendors require specific software 
or hardware for display and printing. Images can be coloured, 
grey-scale or black and white, called bitonal. They can be 
uncompressed or compressed using several different 
compression algorithms. 


Image files are much larger than text files, thus 
compression is necessary for their economic storage. A 
compression algorithm reduces a redundant string such as 
one or more rows of white bits, to a single code. The standard 
compression scheme for black and white bitonal image is 
the one developed by the International Telecommunications 
Union (formerly Consultative Committee for International 
Telephony & Telegraphy (CCITT) for group 4 fax images, 
commonly referred to as CCITT Group 4 (G-4) or ITU-G-4. 
An image created as a TIFF and compressed using CCITT 
Group 4 is called a TIFF G4 which is the de facto standard 
for storing bitonal images. 


Table 3.1 : File Formats in Digital Libraries 


Abbreviation Format File 
Extension 


File Format for Unstructured Text 
ASCII American Standard Code for Information 
Interchange txt 


158 Manual of Digital Libraries 


File Format for Structured Text 


SGML Standard Generalized Markup Language -sgml 
HTML Hypertext Markup Language „html 
XML Extended Markup Language xml 
PDF Portable Document Format (Adobe) .pdf 
PostScript PostScript (Adobe) .ps 
TEX Texture Format xt 
File Format for Images 
PDF Portable Document Format .pdf 
BMP Bit Map Page (Windows) .bmp 
IMG Ventura Publisher .img 
MPEG Joint Photographic Expert Group .mpg 
JFIF JPEG File Format jfif 
PCP PC Paint (B&W Mode) .pcp 
PCX PC Paint Brush (Color & B&W) .PCx 
PSD Photoshop .psd 
TGA True Vision Targa .tga 
PNG Portable Network Graphic .png 
TIFF Taged Image File Format „tif 
TIFF-G4 Taged Image File Format with Group 4 
Fax Compression „tif 
SPIFF Still Picture Interchange File Format Spf 
PCD Photo CD (Kodak) .pcd 
Web-Compatible Image File Format 
GIF Graphics Interchange Format .gif 
JPEG Joint Photographic Experts Group JPG 
JFIF JPEG File Format .jff 
Audio File Format 
WAVE Waveform Audio (Microsoft) .wav 
AIFF Audio Interchange Format aif 
VoC Creative Voice .VOC 
MIDI Musical Instrument Digital Interface -midi 
SND Sound .snd 
AU - Audio (Sun Microsystems) prau 
RA Real Audio Format (Progressive Networks) .ra 


Digital Libraries Components and Services 159 
AVI Audio Visual Interleave avi 

FLA Macromedia Flash Movie ; fla 

FLC AutoDesk FLIC Animation flc 

MOV Quicktime for Windows Movie -MOv 
MPEG Motion Picture Expert Group -mpg 

MP2 MPEG Audio Layer 2 .mp2 

MP3 MPEG Audio Layer 3 .mp3 


Some of the formats mentioned above are maintained 
and developed by international organizations such as the 
International Standards Organization (ISO), the International 
Telecommunications Union (ITU). 


Most of the scanning software allow saving of scanned 
images in a number of formats. TIFF (Tagged Image File 
Format) is the most commonly used file format and is 
considered de facto standard for bitonal scanning. TIFF is a 
truly multi-platform protocol and is a good candidate for 
scanning projects. Some image formats are proprietary, 
developed and supported by a commercial vendor and require 
a specific software or hardware for displaying the printing 
scanned images. 


It would be appropriate to store a high resolution image 
as a TIFF master (archival format) and distribute the image 
as JPEG/ GIF file (access format). Software are now available 
that would generate a JPEG or GIF files ‘on the fly’ from a 
master TIFF file. 


3.1.12. Organizing Digital Projects 


After saving the discs containing digital objects are 
organized because disc full of digital images without any 
organisation, browse and search options may have no 
meaning except for one who created it. Scanned images need 
to be organised in order to be useful. Moreover, images need 
to be linked to the associated metadata to facilitate their 


160 Manual of Digital Libraries 


browsing and searching. The following steps describe process 
of organizing the digital images : 


(a) Hierarchical Storage : Organize the scanned image 
files into disc hierarchy that logically maps the physical 
organization of the document. For example, in a project on 
scanning of journals, create a folder for each journal, which, 
in turn, may have folder for each volume scanned. Each 
volume, in turn may have a subfolder for each issue. The 
folder for each issue, in turn, may contain scanned articles 
that appeared in the issue along with a content page, 
composed in HTML providing links to articles in that issue. 


(b) Nomenclature : Name the scanned image files ina 
strictly controlled manner that reflect their logical relationship. 
For example, each article may be named after the surname 
of first author followed by volume number and issue number. 
‘dhimanakv 2n1.pdf conveys that the article is by ‘A K Dhiman’ 
that appeared in volume 2 and issue no. 1 of ‘PEARI — A 
Journal of Library Science’. The file name for each article 
would, therefore, convey a logical and hierachial organization 
of the journal. 


(c) Description : Describe the scanned images file 
internally, using image header and externally using linked 
descriptive metadata files. Most software packages provide 
for storing ‘administrative data’ regarding image, i.e., date of 
creation, format type & version, compression technique, and 
name of creator, etc. in the image header. ‘Structured’ or 
‘descriptive’ metadata for images are the keywords assigned 
to each image. 


(d) Table of Contents : The simplest and most effective 
method for providing access is through a table of contents 
and links each item to its respective object or image. Content 
pages of issues of journals, done in HTML, would offer 
browsing facility, full text search to HTML pages or OCRed 
pages can- be achieved by installing one of the free internet 


Digital Libraries Components and Services 161 
search engines like: 


— Oingo Free Search — (http://www.oingo.com/ 
oingo free _search/products.html) 


—  Swish-E — (http://www.berkeley.edu/SWISH-E/) 
— WhatyoUseek — (http://intra.whatuseek.com/) 
— Excite — (http://excite.com/) 


— Google — (http://www.google.com) 

Large scanning projects would, however, require a 
back-end database storing images or links to the images, 
metadata — descriptive/ administrative. Back-end database 
used by most DMS holds the functionality required by most 
web applications. Important document management systems 
like File Net have now integrated their database with HTML 
conversion tools. Further, some of the DMS have also signed 
up with Adobe to incorporate Acrobat and Acrobat capture 
into their web-based DMS. These databases entertain queries 
from users through ‘HTML forms’ and generate search results 
on the fly. 


OCR does not actually convert an image into text but 
rather creates a separate file containing the text while leaving 
the image in tact. There are four types of OCR technology 
that are prevailing in the market. These technologies are: 


(a) Matrix/Template Matching : Compares each 
character with a template of the same character. Such a 
system is usually limited to a specific number of fonts, or must 
be ‘taught’ to recognize a particular font. 


(b) Nature Extraction: Can recognize a character from 
its structure and shape, e.g., angles, points, breaks, etc. 
based on a set of rules. The process claims to recognize all 
fonts. 


(c) Structural Analysis: Determines characters on the 
basis of density gradations or character darkness, 


162 Manual of Digital Libraries 


(d) Neural Networking : t is a form of artificial 
intelligence that attempts to mimic processes of the human 
mind. Combined with traditional OCR techniques plus pattern 
recognition, a neural network-based system can perform text 
recognition and learn from its success and failure. Referred 
to as intelligent character recognition (ICR), a neural nerwork- 
based system can be used to recognize hand-written text as 
well as other traditionally difficult source material. Neural 
network ICR can contemplate characters in the context of an 
entire word. New ICR combines neural networking with fuzzy 
logic. 


But it would be imperative to consider producing a more 
reliable media like microfilm simultaneous to the process of 
digitization if archival preservation is one of the objectives. 
The process of microfilming produce a high-resolution image 
on the microfilm/ microfiche that equates to approximately 
1000 dpi in digital binary scanning. In comparison, a bitonal 
digital image can at best be scanned at 600 dpi for archival 
storage. The microfilm/microfiche can be used for conversion 
to electronic image format. Moreover, future improvements 
in scanning technology can be utilized by rescanning a 
microfilm to obtain high-resolution images. It is expected that 
ultimately electronic scanning will reach or exceed 
photographic quality. Durability and reliability of computers, 
storage media and formats used for electronic image files 
may also increase and stabilize. 


3.1.13. Creating Portals or Gateways for the Electronic 
Collection available on the Web 


The web has become the most successful networked 
hypermedia-based system that allows rapid access to a wide 
variety of networked information resources. It allows linking 
amongst electronic resources stored on servers dispersed 
geographically on distant locations. The portal sites or 
gateways redirect a user to the holders of the original digital 


Digital Libraries Components and Services 163 


material. It may provide its own indexing and search services 
or may combine original resources from a number of different 
providers. The portal sites or the gateways restrict their 
operation to providing linkages to independent third party 
sources. The home pages of all the major education and 
research institutions, especially in developed world, provide 
an organised and structured guide to electronic resources 
available on the internet. 


The gateays emerged in response to the challenge of 
resource discovery in a fast developing internet environment 
in the early and mid 1990’s. Due to the emergence of the 
netork information retrieval systems — Gopher, WWW, 
Archive, netfirst, etc., and access protocols — ftp, gopher, 
telnet, http, etc., innovative information technologies and 
services emerged. The electronic libraries programme (e-Lib) 
of JISC of the UK Higher Education Funding Council set up 
in 1995, which includes besides other things, access to 
network resources and subject gateays, that were funded as 
the part of access to network area and latter on it lead to the 
funding and establishment of e—Lib gateways. 


Many library web sites get congested with ample 
content, general objectives and duplication of services. It is 
no longer simple and clears how to do research or projects 
with a tool that provides gateways for getting information. 
There are number of steps that a library researcher must 
perform to successfully get information. Thus gateways are 
needed to improve the effectiveness of Internet searching 
and will serve as a source of information in specific areas 
and saves the time of the users. 


Gateways do not exist in isolation, rather they form part 
of the wider experience of resource discovery for the user. 
On the one hand, the searcher whether child or professor is 
faced with the compelling option of using the global services, 
such as Yahoo, Altavista, and Google, as a first step. The 


164 Manual of Digital Libraries 


undifferentiated experience offered by such services can be 
compared with the specialist view offered by information 
gateways. Gateways offer the user an alternative to the 
generalist approach of the commercial global search engines, 
but in order to optimize the gateway service we need to gain 
a better understanding of users’ requirements for particular 
types of search during the research and learning process. 


Dempsey and other state that gateways are internet 
services which supports systematic resource discovery. They 
provide links to resources (documents, objects, sites or 
services) predominantly accessible via the internet. The 
service is based on resource description and browsing access 
to the resource via subject structure is an important feature. 


Gateway has been defined by the Australian Subject 
Gateways Forum as: “a Web-based mechanism for accessing 
a collection of high quality, evaluated resources identified to 
support research in a particular subject discipline”. It is a 
service which is accessed via a portal, through open standard 
protocols — such as Z39.50 and Harvest Broker. What the 
end-user sees is a perfectly structured Web-based. A simple 
definition of a subject gateway is: 


“An Internet accessible collection of descriptions and 
location details fora range of information, generally available 
electronically, organized by subject or discipline, and selected 
for inclusion based ona published set of quality criteria.” 


IMesh Toolkit Project provides the following definition 
of gateway: 


“A gateway is a web site that provides searchable and 
browsable access to online resources focused around a 
specific subject. Gateway resource descriptions are usually 
created manually rather than being generated via an 
automated process. Because the resource entries are 
generated by hand they are usually superior to those available 


Digital Libraries Components and Services 165 
from a conventional web search engine.” 


A further description of gateways as promoted by the 
DESIRE Project, funded by the European Commission, is: 


“Selective... gateways on the Internet are characterised 
by their quality control. The core activities of resource 
selection and description rely on skilled human input (by 
librarians, academics and experts) and are not activities that 
lend themselves to automation.” Gateways are characterized 
by following key factors : 


Resource Selection : Ahuman intermediary adds value 
by selecting appropriate Internet resources, usually according 
to rigid selection criteria. Resources are usually selected for 
their quality, authority, accessibility, currency and subject 
relevance. Other selection criteria may also apply such as 
language or geographic coverage. 


Collection Maintenance : Regular maintenance of the 
collection occurs, including the removal of resources that are 
no longer appropriate, are superseded or contain data entry 
errors. Regular link checks may also be carried out. These 
procedures may be automated or carried out manually 
through human intervention. 


Resource Description : Selected resources are 
annotated by a human intermediary with a full description of 
the resource. The descriptions are to be entered according 
to a predefined, structured metadata schema. The metadata 
are structured into separate fields, which enables the resource 
to be easily identified and located. It also facilitates structured 
searching. The descriptions may contain information about 
the content, author, publisher or publication date of the 
resource. 


Subject Classification : A human intermediary uses a 
subject classification schema to index all resources. This 
facilitate subject browsing. 


166 


Manual of Digital Libraries 


But a gateway is not like a search engine, rather both 
are different. Their difference can be enumerated as following 


table 3.2. 
Table 3.2. Difference between Search Engine and 
Gateways 
Search Engine Gateways 


General resources are available. 


It totally depends on the 
powerfulness of the selected 
search-engines algorithms. 


The results can be overwhelming, 
unmanageable, and full of irrelevant 
references and are often too 

prolific to meet user needs. 


Records are created by an 
automatic process and typically 
consist of mixture of metadata 
offered by the author of the page, 
if this is available, and text picked 
up from the page itself. 


Entries are displayed more as 
“raw-data”. 


It indexes pages. 


It is a gathering place of discipline 
specific resources. 


High level of human input is there, 
as the se-resources must meet a 
number of criteria applied by a 
librarian or academic, who 
ensures that only high quality, 
relevant resources are included in 

the database. j 


The results are specific, precise 
and linked to relevant documents. 


Records are created by a 
cataloguer, which is designed to 
highlight the main feature of 
resource in an easily readable, 
concise fashion. 


Entries are described in a more 
“human- readable fashion” 


It indexes resources. 


Libraries are the most suitable institution to undertake 
gateway formation work due to the following reasons : 


— The natural metaphor, 


— Browsing reference desk, 
— Expertise in relevant area, 


— Classification, acquisition, keywords, 


Digital Libraries Components and Services 167 


— Expert in information seeking behaviour, and 
— Guiding and helping users. 


Gateways possess following usefulness for libraries and 
users: 


— Leading the way into the information age, 

— Communicating with non-words, 

— Access to huge-high quality collection, 

— Integrate into existing structures on the internet, 
— Diverse resource brought together, 


— Research, learning, leisure, enrichment - all brought 
together, 


— Someone to ask - what’s where? and 
— What is good for them? 


Some of key Initiatives for building tools and standards 
in Subject Gateways taken so far are briety desicribed below. 


ROADS (Resource Organisation and Discovery in 
Subject-based Services): Itis being funded by the JISC (Joint 
Information System Committee) through e-lib programme. It 
is an open source set of software toolkit, which enable the 
set up and maintenance of web-based gateways. A ROAD 
based information gateways is based on a database that 
contains information about internet resources. The records 
in the database contain information such as description and 
keywords. The user is given access to this information while 
either browsing or searching the database. This is particularly 
important for geographically distant resources that might 
require some time and effort to access. The software includes 
the database technology, required to set up a gateways. For 
downloading the free online software visit its site URL: http: 
/Iwww.ilrt.bris.ac.uk/road. 


DESIRE (Development of a European Service for 
Information on Research and Eduction) : This is one of the 


168 Manual of Digital Libraries 


largest projects funded by the Telematics for Research Sector 
of the Fourth Framework Programme funded by the European 
Union. In particular, DESIRE intends to provide: 


— Tools for indexing and cataloguing information servers, 


— Tools for management and maintenance if information 
servers, 


— Demonstration and evaluation of tools and techniques 
for information catching and secure access to 
information servers, 


— Background information for developers of networked 
information systems, and 


— Training materials. 


DESIRE has published the “Information Gateways 
Handbook” a guide for libraries interested in setting up large- 
scale subject gateways of their own. This handbook is freely 
available at its site— (http:// www.desire.org) and it describes | 
all the methods and tools require to set up a large scale 
internet gateways. 


Besides, there are subject gateways or portal variably 
called subject-based information gateways (SBIGs), subject- 
based gateways, subject index gateways, virtual libraries, 
clearing houses, subject trees, pathfinders, guide to internet 
resources, and a few more variations thereof, provides an 
organized and structured guide to Internet-based electronic 
information resources that are carefully selected after a 
predefined process of evaluation and filtration in a given 
subject area or specialty. Subject gateways redirect a user 
to the holders of the original digital material. The subject 
gateways restrict their operation by providing linkages to 
independent third party sources. 


Some of the important subject gateways are: 


e LibrarySpot.com: (http://www. |libraryspot.com/) 


Digital Libraries Components and Services 169 


Librarians’ Index to the Internet (LII) (http://lii.org/) 
Argus Clearing House (http://www.ciearinghouse.net/) 
Galaxy (http://galaxy.einet.net/) 


Direct Search (http://gwis2.circ.qgwu.edu/-gprice/ 
direct.htm) 


Vlib: the Virtual Library (http://www. vlib.org/) 
Academic Info (http://www.academicinfo.com/) 


The Scout Report (http://scout.cs.wisc.edu/report/sr/ 
current/) 


LivingInternet.com (http://www. livinginternet.com/) 


Edinburgh Engineering Virtual Library (EEVL) (http:// 
www.eevl.ac.uk) 


Social Science information Gateway (SOSIG) (http:// 
sosig.ac.uk/) 


Digital Librarian (http:/Avww.digital-librarian.com/) 
QUEST.net (http://www.re-quest.net/) 

Internet Public Library (http://www. ipl.org/) 

BioMedNet (http:/ /www.bmn.com/) 

Few of them are briefly described here: 

The Virtual Library (http://www. vlib.org/) : The Virtual . 


Library is the oldest catalogue of the web, started by Tim 
Berners-Lee, the creator of HTML and the Web itself. Unlike 
commercial catalogues, it is run by a loose confederation of 
volunteers, who compile pages of key links for particular areas 
in which they are expert; even though it is not the biggest 
index of the Web. The Virtual Library pages are widely 
recognized as being amongst the highest-quality guides-to 
particular sections of the Web. Individual indexes live on 
hundreds of different servers around the world. A set of 
catalogue pages linking these pages is maintained at http:// 


170 Manual of Digital Libraries 


vlib.org. Mirrors of the catalogue are kept at East Anglia (UK), 
Geneva (Switzerland) and Argentina. Each maintainer is 
responsible for the content of their own pages, as long as 
they follow certain guidelines. 


Academic Info (http://www.academicinfo.com/): 
Academic Info, online since 1998, began as an independent 
Internet subject directory owned by Michael Madin and 
maintained with the assistance of a quality group of subject 
specialists. In the spring of 2000, Michael left the University 
of Washington Gallagher Law Library to focus solely on 
Academic info. In 2002, Academic Info became a registered 
non-profit organization of the State of Washington. Academic 
Info is now ad-free and relies on donations to remain online. 
Academic Info aims to be the premier educational gateway 
to online high school, college and research level Internet 
resources. The primary focus of the site is academic, with its 
intended audience at the upper high school level or above. 
The priority is in adding digital collections from libraries, 
museums, and academic organizations and sites offering 
unique online content. The current focus is on English 
language resources but selectively sites in other languages 
may also be considered. Users can search by subjects like 
the Arts, Biological Sciences, Business, Digital Library, 
Education, Engineering, Health & Medicine, History, 
Humanities, Law & Government, Library & Information 
Science, Religion, Sciences, and Social Sciences. 


Librarians’ Index to the Internet (http://lii.org/): The 
Librarians’ Index to the Internet (LII) consists of more than 
8,600 internet resources selected and evaluated by librarians 
for their usefulness to users of public libraries. Free e-mail 
subscription to the LII New This Week (http://www.lii.org/ 
search/ntw) incorporates most recent resources added to the 
Lil. It has close to 12,000 subscribers in 85 countries. ILL 
also offers co-branding service to the libraries that are 
members of the Library of California. The site provides both 


Digital Libraries Components and Services 171 


browsing and searching interfaces. 


Argus Clearing House (http./www.clearinghouse.net/): 
The Argus Clearing House is a guide to the meta resources 
which provides a central access point for value-added topical 
guides that identify, describe, and evaluate Internet-based 
information resources. It is a non-profit venture run by a small 
group of dedicated individuals. Argus Clearinghouse is 
intended to be a resource that brings together finding aids for 
students, researchers, educators, and others interested in 
locating authoritative information on the Internet. 


LibrarySpot.com (http.//www.libraryspot.com/) - 


LibrarySpot is a free virtual library resource centre for 
educators and students, librarians and their patrons, families, 
businesses and just about anyone exploring the Web for 
valuable research information. LibrarySpot.com aims at 
breaking through the information overload of the web and 
bring the best library and reference sites together. Sites 
featured on LibrarySpot.com are hand-selected and reviewed 
by an editorial team for their exceptional quality, content and 
utility. Published by StartSpot Mediaworks, Inc. in the 
Northwestern University / Evanston Research Park, 
LibrarySpot is designed to make finding the best topical 
information on the Internet a quick, easy and enjoyable 
experience. LibrarySpot.com has received more than 30 
awards and honours. Most recently, Forbes.com selected 
LibrarySpot.com as a “Forbes Favourite” site, the best in the 
reference category, and PC Magazine named it one of the 
Top 100 Web Sites. LibrarySpot.com has also been featured 
on CNN, Good Morning America, CNBC and in many other 
media outlets. 


Galaxy (http:/galaxy.einet.net/): Galaxy, originally a 
prototype associated with the DARPA-funded Manufacturing 


Automation and Design Engineering (MADE) program, is the 
oldest browsable / searchable web directory. Its mission is to 


172 Manual of Digital Libraries 


provide contextually relevant information by integrating state- 
of-the-art technology with the human touch. Galaxy employs 
the best of technology and human expertize to organize 
information in a way that makes it both understandable and 
highly relevant to users’ needs. The information contents of 
the meta resource is compiled and organized by human 
Internet Librarians rather than by computer. Its hierarchy is 
built utilizing a vertical structure, i.e. the information on 
particular topics is very deep in content. While other search 
technologies may yield millions of pages per search, Galaxy 
provides concentrated, relevant results. 


Direct Search (http://qwis2.circ.qwu.edu/~gprice/ 
direct htm) : Direct Search is a growing compilation of links 
to the Internet resources that contain data not easily or entirely 
searchable / accessible from general search tools like Alta 
Vista, Google, or Hotbot. Direct Search has its own search 
interface. 


LivingIntemet.com (http://www. livinginternet.conyY) : The 
mission of this web site is to make comprehensive, in-depth 
information about the Internet available around the world. The 
site was developed from 1996 through 1999, posted on 
January 7, 2000, and is updated weekly. The site is equivalent 
to a book of more than 600 pages, with more than 2,000 intra- 
site links and 2,000 external links woven into the text, making 
it the first Internet publication of a reference work fully 
integrated with the web on this scale. Google ranks this site 
number one in the Internet courses category, and Yahoo lists 
it as one of the top three sites on Internet history. 


Edinburgh Engineering Virtual Library (http:// 
www.eevl.ac.uk/) : Edinburgh Engineering Virtual Library 
(EEVL) is an award-winning free service, which provides quick 
and reliable access to the best engineering, mathematics, 
and computing information available on the internet. It is 
created and run by a team of information specialists from a 


Digital Libraries Components and Services 173 


number of universities and institutions in the UK for students, 
staff and researchers in higher and further education, as well 
as anyone else working, studying or looking for information 
in Engineering, Mathematics and Computing. EEVL provides 
a central access point to networked engineering, mathematics 
and computing information. Resources being added to the 
catalogues are selected, catalogued, classified and subject- 
indexed by experts to ensure that only current, high-quality 
and useful resources are included. They include e-journals, 
databases, training materials, professional societies, 
university and college departments, research projects, 
bibliographic databases, software, information services and 
recruitment agencies. EEVL, in addition to Internet Resource 
Catalogues, provides targeted engineering search engines 
to UK engineering sites, to engineering e-journals and to 
engineering newsgroups, and to specialized information 
services, such as the Recent Advances in Manufacturing 
(RAM) bibliographic database, and the Offshore Engineering 
Information Service. MathGate at EEVL is involved in the 
Secondary Homepages Project for UK Mathematics 
Departments. EEVL ’s scope is limited to the three subjects, 
and is therefore more focused than the big search engines. 
Searching EEVL retrieve high quality resources, but EEVL’s 
resources are handpicked, hence the numbers of sources 
covered in it are not comparable to the Internet search 
engines. 


Social Science Information Gateway (hitp./sosig.acuk): 
The Social Science Information Gateway (SOSIG) is a freely 
available Internet service which aims to provide a trusted 
source of selected, high quality Internet information for 
students, academics, researchers and practitioners in the 
social sciences, business and law. It is a part of the UK 
Resource Discovery Network. SOSIG Internet Catalogue is 
an online database of high quality Internet resources. It offers 
users the chance to read descriptions of resources available 


174 Manual of Digital Libraries 


over the internet and to access those resources directly. Its 
Catalogue points to thousands of resources and each one 
has been selected and described by a librarian or 
academician. The catalogue is browsable or searchable by 
subject area. Social Science Search Engine is a database of 
over 50,000 Social Science Web pages. Whereas subject 
experts have selected the resources found in the SOSIG 
Internet Catalogue, those in the Social Science Search Engine 
have been collected by software called a ‘harvester’. All the 
pages collected are stemed from the main Internet catalogue 
which provides the equivalent of a social science search 
engine. 


Scout Report (http://scout.cs.wisc.edu/report/sr/ 
current) : The Scout Report is the flagship publication of the 


Internet Scout Project published every Friday both on the web 
and by e-mail. It provides a fast, convenient way to stay 
informed of valuable resources on the Internet. A team of 
professional librarians and subject matter experts select, 
research, and annotate each resource. Published 
continuously since 1994, the Scout Report is one of the 
Internet’s oldest and most respected publications. This 
Internet Project is located in the Department of Computer 
Sciences at the University of Wisconsin-Madison, and is 
funded by a grant from the National Science Foundation. 


Internet Public Library (http://www. ipl.org/) : \nternet 
Public Library is a product of the University of Michigan’s 
School of Information and Library Studies. It includes 
extensive directories of online texts, newspapers, magazines 
and reference materials; an exhibit hall and other special 
sections. At the moment, the home page features links to 
over 4,699 critical and biographical sites dedicated to authors 
and their works, and an online history of the Harlem 
Renaissance in New York between 1900 and 1940. 


Digital Librarian (http://www. digita!-librarian.com/) : \t 


Digital Libraries Components and Services 175 


is maintained by Margaret Vail Anderson, a Librarian in 
Cortland, New York. Internet information resources are 
catalogued according to subject categories and format-types. 
Digital Librarian does not have a search interface for the 
resources catalogued on the site rather it has a browsing 
interface and see and see also references to related 
resources. 


QUEST. net (http://www.re-quest.net/) : QUEST.net is 
a free online library offering substantive, fully annotated, links 
to valuable resources in both a unique frame version and a 
non-framed version. This web site serves to help students 
and professionals to locate day-to-day and much needed 
information and resources in a relatively quick and concise 
manner. It serves as a one-stop resource directory, providing 
the Internet community with thousands upon thousands of 
links, which its committee of web surfers, have found to be 
the most useful, informative and productive. The meta 
resource provides fully annotated description of each link 
together with its URL allowing visitors to know what to expect 
from the Web site. Each link has been specially hand picked 
to provide with the best and most relevant links in each 
category. This web site is useful in an extraordinary way with 
its devoted committee of web surfers work diligently, day- 
after-day, sorting through the vast galaxies of cyberspace to 
bring the best and most current resources available. 


BioMedNet (http-/www.bmn.com/) : BioMedNet is 
owned by Elsevier Science which is part of the Reed Elsevier 
group of companies. BioMedNet is the Web site for biological 
medical researchers. There are more than 800,000 members 
of BioMedNet with more than 20,000 people joining per month. 
Membership to BioMedNet is free and members can search 
all of BioMedNet without charge. However, viewing full-text 
articles from publishers often requires payment or a 
subscription. The site has links to more than 3500 reviewed 
information resources. The resource provides online access 


176 Manual of Digital Libraries 


to more than 15,000 review articles. The BioMedNet 
Magazine is issued every fortnight, which can be subscribed 
by e-mail or accessed online. 


Other Subject Gateways include : 


Biz/ed - Business and Economics Education on the 
Internet (http:/www.bized.ac.uk/) : Biz/ed is a free online 
service for students, teachers and lecturers of business, 
economics, accounting, leisure and recreation and travel and 
tourism. The gateway contains a ROADS based Internet 
catalogue with over 1400 Internet resources selected and 
described by subject experts. 


Biz/ed is targeted at students and teachers in the post- 
16 education sector, covering schools, FE colleges, 
universities and beyond. The site offers support for 
economics, business, accounting, leisure and recreation and 
travel and tourism at many different levels including AVCE, 
AS and A2 level, International Baccalaureate, HNC, HND and 
MBA. The BiZz/ed site is a unique combination of primary and 
secondary teaching and learning resources. Resource 
discovery is integrated with simulations, worksheets, 
glossaries, spreadsheets, resource databases, online chat 
with examiners and a series of Virtual Worlds to give a rich 
package of support for teachers, lecturers and students. 


National Maritime Museum’s Port (hAttp:// 
wwwport.nmm.ac.Uk/) : itis the National Maritime Museum’s 
subject gateway to maritime information from the Internet. 
This gateway provide access to searchable and browseable 
catalogues of Internet based resources, all of which have been 
quality controlled or assessed before inclusion on the site. 
Librarians at the National Maritime Museum are actively 
involved in cataloguing and recording quality controlled online 
resources for this Port. 


The information on Port has been grouped into twenty 
subject headings. Each subject heading divides into specific 


Digital Libraries Components and Services 177 


groups of information, improving accessibility for researchers 
to make Port easier to browse and to search. An example of 
this is the hierarchical structuring of Conflicts at Sea, in which 
refined tiers of information on naval and military history lead 
users to subject-specific information on battles and wars, such 
as World War-ll, followed by resources on D-Day. All websites 
included in the gateway are catalogued in records that assess 
the resource’s content, origin and nature. This means that 
users can access Port in the knowledge that they are looking 
at a quality-controlled collection of resources. When you use 
Port to locate online resources you know that it has the 
professional qualities and rigorous standards usually 
associated with such an institution as the National Maritime 
Museum. 


OMNI! - (Attp:/www.omni.ac.uk/) : OMNI (Organising 
Medical Networked Information) is a gateway to evaluate 
quality Internet resources in health and medicine, aimed at 
students, researchers, academics and practitioners in the 
health and medical sciences. OMNI is created by a core team 
of information specialists and subject experts based at the 
University of Nottingham Greenfield Medical Library, in 
partnership with key organizations throughout the UK and 
further in the field. OMNI, also provides training materials and 
workshops. Browsing can be done either via alphabetical 
topics, classified topics, or via MeSH headings. In addition, 
OMNI provides a range of biomedical value-added services, 
including a MEDLINE review section, mirrors of key NHS IT 
strategy documents, and the UK CME database. 


OMNI is one of the gateways within the BIOME service 
(http://biome.ac.uk/). BIOME is part of the Resource 
Discovery Network (RDN) at http://www.rdn.ac.uk/, and is 
funded by the Joint Information Systems Committee (JISC). 


ICSU Navigator for Primary Scientific Publications 
(http:/eos. wacb.ru/icsu/navigator/navy.htm) : The main goal 


178 Manual of Digital Libraries 


of this project is to create representative source of information 
on the primary scientific publications in all disciplines covered 
by the International Council for Science (ICSU). 


The database basically will include descriptions of and 
links to materials, which are considered as primary scientific 
publications, i.e., scientific journals, serials and other relevant 
publications, which are published or approved by ICSU bodies 
as publications containing real science. The latter is the main 
criteria for including a publication into the ICSU Navigator 
database. The ICSU Press recommended focusing the 
proposal in more narrow scientific areas and soliciting support 
and advice from the relevant unions. The recommended areas 
are Geophysics — with further transition to Earth Sciences 
and Physics. 


Portal is emerging as modern significant tool for 
retrieving and delivering the contents of e-resources more 
quickly, efficiently and effectively through Internet. A portal is 
a web site that acts as a single source for all information ona 
specific domain, An effective Web portal offers the user a 
broad array of information, arranged in a way that is most 
convenient for the user to access. When designed, 
implemented and maintained correctly, a web portal becomes 
the starting or entry point of a web user introducing him to 
various information, resources and other sites on the internet. 
Popular Portals are Yahoo, Google, MSN etc. 


Sun and Sun described, a portal as “A web page that 
serve as any entry point or gateway to resources and 
services”. According to Denis Howe, a portal is “A website 
that aim to be an entry points to the world wide web, typically 
offering a search engines and /or links to the useful pages 
and possibly news or other services. 


The defining characteristic of a portal is the user-driven 
customizability of websites content. A portal is only possible 
component to the library’s web presence. 


Digital Libraries Components and Services 179 


“In the library community, portals may be defined as 


an amalgamation of services to the patron. The amalgamation 
is achieved through seamless integration of existing services. 
It uses binding agents such as customization and 
authentication services, search protocols (Z39.50), loan 
protocols (such as 15010161), and e-commerce. The result 
is a personalised service which allows the individual to access 
the rich content of both print-based and electronic systems.” 


are: 


Portals after various advantages to us; some of them 


They disseminate various types of information (events, 
reports, and programs) knowledge, ideas, messages 
and data. 


The available resources can be used remotely for 
education and research purposes. 


They reduce the time for searching required information 
as compared to traditional way. 


They improve the knowledge, manage and offers 
experience of individuals or group of individuals that is 
the key assets for the future generations. 


Knowledge portals improve the learning process and 
help in developing the learning environment in 
organizations. 


The portals allow sharing all the internal documents, 
best practices, policies, procedures, expertize and 
experience of individual and external documents. 


They improve the security of the content because they 
allows access on single platform which is protected to 
view or manipulate. 


They allow to integrate various applications into the 
single database so that the relevant information can 
be obtained as and when required. 


180 Manual of Digital Libraries 


e The portals provide various documents’ contents in a 
single platform. 


e Portals improve decision making through accurate 
information. ` 


e They reduce the labour cost, paper based documents 
in the organizations or institutions, and 


e They help in day-to-day routine work. 


If we talk about its types, there are actually many, 
different types of portals; each one tailored to meet a specific 
business need. 


Vertical Portals : These are the web portals which focus 
only on one specific industry, domain or vertical. Vertical 
portals simply provide tools, information, articles, research 
and statistics on the specific industry . As the web has become 
a standard tool for business portals provide an ideal gateway 
for businesses to market their products & services and to 
gain exposure within their vertical by developing and using 
portals. Classic examples of vertical portals are — cnet.com 
which focuses only on computer and related issues,and 
mp3.com only on mp3 audio etc. 


Horizontal Portals : These are web portals which focus 
on a wide array of interests and topics. They focus on general 
audience and try to present something for everybody. Classic 
examples of horizontal portals are yahoo.com, msn.com etc. 
which provide visitors with information and on a wide area of 
topics. 


Enterprise Portals : An enterprise portal, sometimes 
called a corporate portal provides personalized access to an 
appropriate range of information about a particular company. 
Initially called intranet portals - enterprise portals existing for 
the benefit of the company’s own employees, this set of 
technologies has developed to assist and provide access to 


Digital Libraries Components and Services 181 


a company’s business partners (suppliers, customers) as well. 
More advanced enterprise portal solutions provide access 
via mobile devices, such as cell phones, PDA’s, handheld 
PC’s etc., facilitating on the road work, decision making and 
business processes. 


The most common implementation of enterprise portals 
focus on providing employees with this information on a 
regular updated manner along with document management 
system, availability of applications on demand, online training 
courses and web casts etc., along with communication in the 
form of emails, messaging, web meetings etc. 


Knowledge Portals : Knowledge portals increase the 
effectiveness of knowledge workers by providing easy access 
to information that is necessary or helpful to them in one or 
more specific roles. Knowledge portals are not mere intranet 
or enterprises portals since the former are supposed to 
provide extra functionality such as collaboration services, 
sophisticated information discovery services and a knowledge 
map. 


Market Space Portals : Market space portals exist to 
support the business-to-business and business-to-customer 
e-commerce. These portals facilitate the sharing of 
information to external partners, customers and suppliers. 
They usually have a transactional processing component, 
provide information on products and services which include 
supply chain management features. This type of portals aim 
to increase the value of the relationship whilst lowering the 
cost. i 


Self-Service Portals : Self-service portals allow 
employees, customers or suppliers to access information 
about themselves and to carry out certain business processes 
in a way that is suited to their own needs. Such portals are 
usually justified in terms of removing hard cost from the 
business through self-service options. 


182 Manual of Digital Libraries 


Business Intelligence Portals : Business intelligence 
portals or decision portals empower users in their decision- 
making process. More than just allowing users to query and 
report across multiple data stores, business intelligence 
portals have built-in tools that provide targeted reports to end- 
user groups and individuals. 


Collaboration Portals : Collaboration portals enable a 
geographically dispersed workforce to interact around projects 
and business-as-usual tasks through a common access or 
rallying point. Such portals offer generic tools such as chat, 
white boards and threaded discussion streams along with 
ways to share objects such as maps, and documents. 


E-Learning Portals : No longer the domain of academic 
institutions alone, e-Learning portals focus on guiding 
students in the broadest sense through a structured learning 
experience. E-learning portals test abilities and provide 
feedback to the students in a personalized and confidential 
manner. They may also interact with other systems and 
business processes to provide in-context training and help. 


Communication Portals : Communication portals 
aggregate various forms of messaging into a single place. 
Bringing together email, voice, mobile, web feeds etc., in a 
way that allows access and control from multiple interfaces 
and locations at any time. The individual can then tailor this, 
choosing for example, to receive and manage critical 
communications regardless of where they are or what type 
of device they have with them. 


Workspace Portals : A Workspace portal is a single, 
coherent, integrated portal that presents its users with all the 
information they need to carry out their jobs. Workspace portal 
represent the radical vision of a portal providing the user 
interface people always wanted and never had a user 
interface making available all the information necessary for 
an employee's job role. The current alternatives to a 


Digital Libraries Components and Services 183 


workspace portal are specialized portals or the contemporary 
Windows desktop. Thus, the advantages workspace portals 
have to offer over these alternatives ought to be evident and 
convincing. 


Whatever the types it may be, however, a flourishing 
portal consists of a good collaboration support and a good 
integration of the information sources. The major functions 
of a Portal are mentioned below: 


Search and Navigation : Search and navigation forms 
the basis for most of the brandishing public web portals, which 
means that a successful portal should support its users in an 
efficient search for contents. A portal is best when it provides 
right information to the right users and it should also provide 
additional information, and allow the user to voluntarily 
personalize the information presented by the portal. 


Personalization : Personalization is important for the 
delivery of appropriate information to Portal users. There 
should be mechanism that each user gets only the information 
which is specifically tailored to his /her needs. Personalization 
should be based on user roles, as well as user preferences. 
The different types of personalization can be as given below: 


— Personalization of navigation, e.g., shortcuts to 
specific information, mostly known as bookmarks or 
favorites. 


— Personalization of layout, e.g., what information 
appears where on the screen, in which form, color and 
size? and 


— Personalization of data/content, e.g., which stocks one 
wants to see in the stock ticker. 


Information Integration : A Portal should guarantee the 
integration of information from disparate sources. Moreover, 
the user should also be able to optimally use this information. 


184 Manual of Digital Libraries 


There are several mechanisms for doing this. One such 
promising technique of innovative interface is the Unified 
Content API (Application Programming Interface), which 
speeds up the development of portal applications. The Unified 
Content API supports all current tools for developing web 
environments, such as JAVA, C++, ActiveX, Visual-and Non- 
Visual -Java Beans. 


Task Management and Workflow : Portals providing 
task management services can help users to take part in 
managing formally defined business processes. The workflow 
functionality allows the automation of business processes. 
Thus, as part of a workflow-automated business process, a 
portal should be able to prompt its users when they have 
tasks to perform. 


Notification : Notification which is also known as push 
technology is referred to as system in which a user receives 
. information automatically from a network server. Push 
technologies are designed to send information and software 
directly to a user’s desktop without the user actively requesting 
it. Thus, the user has the opportunity to subscribe to activate 
information sources, such as — news feeds and periodically 
update reports, and ask to be altered when documents are 
updated. 


Collaboration and Groupware : Knowledge 
management and groupware ensure that the required 
information is stored in the right place in the right mode. By 
this means the right persons are brought together with the 
right information. Groupware software assists in less formal 
collaboration than workflow tools. As with workflow 
automation, groupware increases the value delivered by many 
types of specialized portals, for example: 


— [Increases the attractiveness of business-to-consumer 
e-commerce portals. 


Digital Libraries Components and Services 185 


— Enables informal communication between suppliers 
and customers in business-to-business e-commerce 
portals. 


Supply chain portals are also dependent on 
collaboration support in order to help suppliers and their 
customers manage their relationships. Moreover collaboration 
support is a key requirement for knowledge portal. 


/nfrastructure functionality : The infrastructure 
functionality constitutes the fundamental for the work 
environment. The other functionalities which were mentioned 
above all are built on infrastructure functionality. The runtime 
infrastructure associated with the portal will have a primary 
effect on manageability, scalability, security and availability. 


Portals provide a broad range of services and content 
to a diverse range of customers. They do not target their 
services to a particular demographic group, industry or topical 
category. They can therefore be characterised as “horizontal” 
in scope. They offer at least — web searching, news, reference 
tools, access to online shopping, and some communicatic 
capabilities such as free e-mail and chat. 


Vortals are the vertical portals which provide conte 
aggregation relevant to their industry, with links to relatet 
industry, supplier and even competitor sites. They may have 
community and collaboration capabilities and e-commerce 
services for products and services relevant to its industry. 
Vertical portals also try to leverage branding and associated 
technologies in a focused way. 


Internet Resource Catalogue is just one of the services 
offered by a gateway or portal. It is a database of Internet 
resource descriptions that is made accessible through a 
structured and/or unstructured network service. 


Besides there are Clearinghouses. Clearinghouse 
contains computer-based education resources, with reviews, 


186 Manual of Digital Libraries 


and links to related material. One particular clearinghouse 
service was funded by the Australian Committee for University 
Teaching and Staff Development. This type of service may 
be superseded by IMS (IMS Global Learning Consortium 
Inc.). 


So, gateways, in every discipline and subjects help the 
users in Academic Libraries where the thrust is on cutting 
edge technologies. Gateways serve as a ready reference tool. 
It is hoped that this compilation will be useful to the 
departmental needs. While studies of such nature can never 
be all comprehensive and complete, attempts should be made 
to narrow down on the resources and services, thus helping 
the users. Updating of resources in such compilations are 
absolutely necessary, only then will the gateway be of 
importance and relevance. 


3.1.14. Providing Integrated Access Interface 


Digital libraries typically integrate a multitude of 
resources and media types. Constituents of a digital library 
may have: (i) collection acquired in digital form; (ii) collections 
digitized in-house; (iii) access to electronic resources including 
e-journals; (iv) subject gateways and the library OPACs. In 
effect, a digital library may not only have a multitude of 
resources but also a multitude of mechanisms to access these 
resources. Most libraries that have sizable collections in digital 
forms and have adopted a two-fold strategy that include: (i) 
provide access to resources through the Library Catalogue 
wherever possible; and (ii) provide access to electronic 
resources and specialized collections through the Library 
Home Page. 


In cases of electronic journals, web access via an 
alphabetical listing and / or subject index of all titles, offers a 
quick and simple means of providing access to journals linked 
to their URLs at the publisher’s site. Similarly, access to other 
specialized collections can also be provided through a sets 


Digital Libraries Components and Services 187 


of menu that serve as rough and ready finding aids. It is 
particularly useful for institutions that have not implemented 
web-based catalogues and cannot offer hypertext links from 
a catalogue record. On the other hand, access to e-journals 
is separate from the online catalogue and other journals that 
are part of the library’s collection. Web-based catalogues can 
enable users to connect directly to the full-text source via 
hypertext links in the catalogue record. 


Acquisition of Endeavour’s Information System by the 
Elsevier Science, marks integration of Elsevier's contents into 
Endeavour’s digital library technology. A new product called 
“ENCompass” from Endeavour would come up as a single, 
seamless search across disparate remote and local 
collections. Cetrix (http://www.cetrix.com/) and OCLC’s Site 
Search (http://www.oclc.org/) also offer software-based 
integration solutions. 


3.2. ACCESS INFRASTRUCTURE 


An effective and efficient access mechanism that allows 
a user to browse, search and navigate digital resources 
becomes necessary as electronics resources of a collection 
grow in number and complexion. The access infrastructure 
for a digital resource consists of webPACs, multi-webPACs 
for library catalogues, specialized collection websites for 
specialized image-based local collection, portals or subject 
gateways for web resources and a search and browse 
interface for local collections. 


3.2.1. Search and Browsing Interfaces 


The users interact with the digital library using its search 
interface which typically support browsing, searching and 
navigation. The search interface provides a visual window 
for users to search and browse relevant information stored in 
a digital resource and to display it. 


188 Manual of Digital Libraries 


Most digital libraries support searching with varying 
degrees of capabilities ranging from “simple search” to 
“advanced search”. In the simple search mode, a user is 
required to enter his or her “query” in the search box. In the 
advanced search mode, a user can use Boolean queries, 
wild cards, phrase searches and field-specific searches. Many 
digital libraries also support relevant-ranking of search results, 
based on the relevance score of the retrieved documents. 
Most digital libraries support full-text based search in addition 
to meta data-based searches. Digital libraries consisting of 
images also support image-based searching based on the 
names of objects appearing in the images. Digital libraries 
built around Geographical Information Systems (GIS) with 
geo-spatial data, support retrieval of relevant portion of maps 
which can be zoomed-in or zoomed-out. National Geographic 
Map Machine available on Attp./plasma.nationalgeograpnic. 
com/mapmachine/is such an example. 


A typical digital library implementation may employ a 
variety of information retrieval techniques including meta data 
searching, full-text document searching and content 
searching or combination of two or all of them. Information 
retrieval is made more effective and user-friendly by 
preprocessing digital documents to extract additional 
metadata before storing them in a database. The database 
is then configured to generate indices from selected fields 
including author(s), titles, abstracts, etc., or it may also be 
configured to generate indices from the full-text articles with 
a pre-defined stop-word list. Depending upon the 
implementation of digital libraries, the search conducted may 
be restricted to a single server or several servers 
geographically dispersed at distant locations. Digital libraries . 
also support “federated searches” wherein the search query 
is sent to search systems on different servers and results 
received from different system are merged and presented to 
the user. A typical example of “federated searches” is the 


Digital Libraries Components and Services 189 


Networked Digital Libraries of Theses and Dissertations 
(NDLTD) project available at (Attp:/~www.ndltd.org/. 


Besides search interface, a browsing interface is also 
a necessity for a digital library to give a user a sense of the 
amount and variety of material and the attributes of these 
materials available in the digital library. Taking advantages 
of the flexibility of electronic presentation, digital libraries can 
have several options for browsing the collection. Browsing 
helps a user in selecting a collection and locating sets of items 
in a collection with similar attributes. It helps a user to learn 
about the collection in general, topics covered and kinds of 
material available in a digital collection, thus helping them to 
formulate their search queries. The browsing interface of a 
digital library generally consists of a combination of 
hierarchical menu and selection buttons where the interface 
guides the user, starting from the top-level subject category 
through a series of progressively narrowing levels within the 
category for a user to select and retrieve associated digital 
objects from the digital library. Browsing interface for a full- 
text library, for example, may consist of research articles 
arranged alphabetically by (i) author's name with article title 
with year of publications as a selectable criteria; and (ii) 
hierarchical presentation of research articles under subject 
categories. Most digital libraries support browsing facility 
through the table of contents which are linked to their full-text 
or to the specific chapters and sections. 


Designing interfaces for digital libraries involve use of 
principles and practices of information management with 
rapidly evolving technological developments. The interfaces 
should maximize the interaction with information resources 
and minimize their attention to the system itself. Marchionini 
prescribes the following goals for designing an interface for a 
digital library: 


— Minimize disorientation by reducing navigation and 


190 Manual of Digital Libraries 
anchoring users in a consistent context; 


— Provide primary information at the earliest point in the 
interaction as far as possible, instead of forcing a user 
to navigate through deep menu hierarchies or execute 
a query; and 


— Support rapid relevance decisions through over viewing 
and previews. 


New developments in web technology allow creation 
of user-friendly interfaces providing features and facilities 
hitherto impossible in traditional command-based or menu- 
based search interfaces. Several important search 
components that a user had to input in a menu-based or 
command-based interfaces can be given in selection / 
combination boxes or radio buttons for a user to select from. 
New opportunities and better search interfaces in the web 
environment has attracted many online search services to 
migrate to the new technology. 


3.2.2. Information Retrieval in Digital Library 


The digital library implementation may employ a variety 
of information retrieval techniques including meta data 
searching, full-text document searching and content 
searching or combination or two or all of them. Information 
retrieval is made more effective and users-friendly by 
preprocessing digital documents to extract additional 
metadata before storing them in a database. The database 
is then configured to generate indices from selected fields 
including author(s), titles, abstracts, etc., or it may also be 
configured to generate indices from the full-text articles with 
a pre-defined stop-word list. The success of information 
retrieval can be measured in terms of percentage of relevant 
“Contents” or “Table of Contents” (ToC) primarily extend a 
browsing interface to the users although search can also be 
restricted to the ToC in a digital library implementation. 


Digital Libraries Components and Services 191 
3.2.3. Meta Resources, Portal or Knowledge Gateways 


The meta resources or portal sites or subject gateways 
redirect a user to the site holding the original material. The 
portals or subject gateways has already been discussed 
earlier in this chapter. 


3.3. COMPUTER AND NETWORK INFRASTRUCTURE 


A typical digital library in a distributed client-server 
environment consists of hardware and software components 
at the server's side as well as at the client’s side. Clients are 
machines that are used for accessing the digital library by 
users while the server hosts databases, digital objects, browse 
and search interfaces to facilitate its access. 


3.3.1. Server-side Hardware Components 


Servers are the heart of a digital library. Servers for 
digital library implementation need to be computationally 
powerful, and must have adequate main memory to handle 
the expected work, a large amount of secure disc storage for 
the database(s) and digital objects and have good 
communication capabilities. A digital library may need a 
number of specialized servers for different tasks so as to 
distribute the workload on to different servers. It would require 
one or more library server(s) to host indices and databases 
and one or more object server(s) to store digital objects and 
other multimedia objects. However, for a smaller library, many 
distinct activities can be performed on a single server. It is 
important that the server is scalable — such as Sun Enterprise 
Server, so that additional storage, processing power or 
networking capabilities can be added whenever required. 


3.3.2. Input Devices 


image-based digital library implementation require input 
devices like scanners, digital cameras, video cameras and 


O29 ©, Manual of Digital Libraries 


PhotoCD systems. A large range of choices are available for 
these image capturing devices. Scanners are available in all 
sizes and shapes. Flatbed scanners or digital cameras 
mounted on book cradle are more suitable for libraries. 


3.3.3. Storage Devices 


Since digital libraries require large amounts of storage, 
particular attention needs to be given on the storage solution. 
Digital library collections that are too large to store entirely a 
disk use hierarchical storage mechanisms (HSM). In an HSM, 
the most frequently used data are kept on fast disks, while 
less frequently used data are kept in nearline such as an 
automated tape library. An HSM can automatically migrate 
data from tape to disk and vice-versa, as required. Intelligent 
storage networks and snap-servers are now available in which 
the physical storage devices are intelligently controlled and 
made available to a number of servers. Although hard disc— 
fixed and removable, solutions are increasingly available at 
an affordable cost, optical storage devices including WORM, 
CD-R, CD-ROM, DVD-ROM or opto-magnetic devices in 
standalone or networked mode, are attractive alternatives for 
long-term storage of digital information. Optical drives record 
information by writing data onto the disc with a laser beam. 
The media offer enormous storage capabilities. A number of 
RAID (Redundant Array of Inexpensive Disks) models are 
also available for greater security and performance. The RAID 
technology distributes the data across a number of disks ina 
way that even if one or more disks fail, the system would still 
function while the failed component is replaced. 


3.3.4. Communication Devices 


Setting-up a digital library requires network and 
communication equipments like communication switches, 
routers, hubs, repeaters, modems and other items required 
in a Local Area Network. These hardware and software items 


Digital Libraries Components and Services 193 


are required for setting-up any network and are not specific 
to a digital library. 


3.3.5. Server-side Software Components 


A typical digital library requires a number of software 
packages to handle its highly diversified resources, activities 
and services. Different software packages are required to 
handle different components and activities of a digital library. 
For example, creating digital objects involving scanning of 
documents requires document imaging software, converting 
material already available in digital format into PDF would 
require Acrobat Software Suite and organization of digital 
objects with associated metadata would require a RDBMS 
package. Since a single integrated software package from a 
single vendor is not available, a digital library software may 
be a system with components added onto an open 
architecture framework. However, there are a few software 
packages that attempt to provide a number functions of a 
digital library in an integrated fashion. Some of them are : 


IBM Digital Library is an integrated solution for storage, 
management and distribution of all types of digital contents 
including text, images, audio and video. It incorporates 
functions of creating and capturing, storage and management, 
search and access, distribution and right management of 
digital intellectual contents in an open, scalable, multi-platform 
environment such as Windows NT and AIX. IBM Digital 
Library’s successful installation include- ISI Electronic Library 
Project, Indiana University School of Music, Case Western 
Reserve University and the State University System of Florida. 


Greenstone is a suite of software which has the ability 
to serve digital library collections and build new collections. It 
provides a new way of organizing information and publishing 
it on the Internet or on CD-ROM. The Greenstone Digital 
Library Software is produced by the New Zealand Digital 


194 Manual of Digital Libraries 


Library Project at the University of Waikato, and distributed 
in cooperation with UNESCO and the Humanities Library 
Project. It is open-source software, available from http:// 
greenstone.org under the terms of the GNU General Public 
License. The Greenstone runs on Windows and Unix platform. 
Its distribution includes ready-to-use binaries for all versions 
of Windows and for Linux. It also includes a complete source 
code for the system, which can be compiled using Microsoft 
C++ or gcc. Greenstone works with associated software freely 
available on the Apache Web server and PERL. 


Ganesha Digital Library (http://gdl.itb.ac.id/) is 
developed under the Indonesian Digitai Library Network 
(IndonesiaDLN). Ganesha Digital Library enables institutions 
or individuals to share their knowledge as well as 
simultaneously access and utilize knowledge in Indonesian 
‘giant memory’ in the form of network of IndonesiaDLN digital 
libraries. The software is available in three publisher editions: 
Personal, Internet Cafe, and Institution. 


Elsevier's Science Server provides an effective and 
powerful information system that provides an integrated 
access to databases and digital collections hosted on the local 
Intranet servers as well as other international bibliographic 
and full-text databases that the library is authorized to use. It 
provides easy and centralized access to multiple information 
sources including local Intranet resources like — electronic 
journals and abstracting and indexing services, and remote 
subscribed Internet services including electronic journals and 
online databases, through a single interface. Science Server 
offers tools to create a fast, powerful system with proven 
scalability and performance, browsing and full-text searching 
capabilities from a single intuitive web interface. The Science 
Direct supports several platforms including Sun Solaris, Digital 
Unix, HP Unix, IBM AIX, Unix and Windows NT. 


Dienst is a digital library system developed at the 


Digital Libraries Components and Services 195 


Cornell University with tasks clearly divided and specified by 
a protocol based on HTTP and eventually using XML. Dienst 
was originally developed to support distributed operation and 
federated search for Networked Computer Science Technical 
Reference Library (NCSTRL) (http:// www.ncstrl.org/) project. 
The Repository-in-a-Box from the University of Tennessee 
and E-print software from the Southampton University are 
alternative software. Both these software avoid complexity 
involved in digital libraries by defining workflows that are not 
easy to change. 


A number of digital libraries are being constructed at 
present utilizing a mixture of information retrieval, media 
management and web server packages. All these pieces of 
software need to be integrated so as to present a cohesive 
environment and to avoid problems with growth and 
expansion. Some of the important software used jn 
construction of a digital library are: 


3.3.6. Image Capturing or Scanning Software 


The process of converting a paper document into a 
computer-processible digital image is done using a software 
variably called document imaging system, electronic filing 
system or document management system, etc. A simple 
scanning software also comes with the scanners. Two 
important document imaging software from India are: 


— OmniDoc ver. 1 (Newgen Software) (http:// 
www.newgensoft.com/). 


— Data Scan (Stacks Software Pvt. Ltd.) (http:/ 
www.stex.com/). 


3.3.7. Image Enhancement and Manipulation 


The captured images may need manipulation to 
enhance their quality. Some of the image enhancement 
features include— filters, tonal reproduction, colour 


196 Manual of Digital Libraries 


management, touch, crop, image sharpening, contrast, and 
transparent background, etc. A few important image 
enhancement packages are : 


— Adobe's Photoshop 9.0 (http://www.adobe.com/). 
— Jasc Inc.’s Paintshop Pro 6.02 (http://www.jasc.com/). 


— Eastman Software, Inc. (http:// 
www.eastmansoftware.com/). 


— Corel Corporation (http://www.corel.com/). 
— Alchemy Mindworks (http://www.alchemy.com/). 


3.3.8. Integrated Library Systems 


Automation of library functions does not make it a digital 
library. However, digital library must be automated in most of 
its essential functions in a hybrid library environment. Some 
of the important integrated Library Systems available in India 
are given below: 


— Libsys ver. 4 (Libsys Corporation) (http:// 
www.libsys.net/). 
— TLMS (OPAC Infosys Pvt. Ltd.) (http://www.tlms.net/). 


— OASIS/ Alice (Softlink Asia) (http://softlinks.com.au/). 


— Basis Plus and TechLib (Information Dimensions, 
(Marketed in India by NIC) (http://www. idi.ocic.org/). 


— DELMARC and DELPLUS (http://delnet.ren.nic.in/). 
—  Suchika (DESIDOC) (http://www.drdo.org/). 

— SANJAY (DESIDOC/ NISSAT) (hitp://www.nisat.org/). 
— MAITRAYEE (CMC India) (http://www.cmcltd.com/). 
— Granthalaya (INSDOC, Delhi) (http:/Awww.insdoc.org/). 


3.3.9. Web Servers 


Setting-up a web-based digital library requires a web 
server program. Many server programs are available for 


Digital Libraries Components and Services 197 


different platforms, each with different features and cost 
varying from free to very expensive. Some of the important 
web server programs are : 
Servers for Unix Systems 

— NCSAHTTPD (http://www.ncsa.uiuc.edu/). 

— Apache (http://www.apache.orq/). 


— Jigsaw 2.1.1 (http://www.w3.org/jigsaw/). 
— Netra (for Sun Solaris) (http://sun.com/). 


Servers for Windows NT 


— Internet Information Server (IIS) (http:// 
www.microsoft.com/iis/). 


3.3.10. Information Retrieval 
Internet search engines may be used on their own or 
be connected to an Integrated library system or DBMS to 
provide a fully searchable collection. All Internet search 
engines are basically free text search engines, i.e., they index 
each and every word ina document. Important search engines 
that can be downloaded for installation at the local site (free 
of cost) or can be interfaced to a local site for search are: 
— ICE: Indexing Kit for Web Servers (http:// 
www.objectweaver.de/ice/). 
—  Extropia (http://www.extropia.com/scripts/ 
keyword search.html). 


— Oingo Free Search (http://www.oingo.com/ 
oingo free _search/products.html). 


—  Swish-E (http://www.berkeley.edu/SWISH-E/). 


— Web Search (http://awsd.com/scripts/websearch/ 
index.html). 


— WhatyoUseek intraSearch (http://intra.whatuseek. 
com/). 


198 Manual of Digital Libraries 


— Excite (http://excite.com/). 


Besides, a number of information retrieval software 
packages offer global finding aids that make an entire digital 
collection more accessible, i.e., without sacrificing the 
metadata and thesauri of each individual resource. These 
packages are: 


— KnowledgeCite Library (Silver Platter) (http:// 
www.knowledgecite.com/). 


— Database Adviser (http://scilib.uscd.edu/proj/dba/). 
— Pharos (http://uias.calstate.edu/). 
— Northern Light (http://northernlight.com/). 


3.3.11. Optical Character Recognition (OCR) 


Most document imaging software have in-built OCR 
packages. However, OCR packages are also available as 
separate utilities. Some of the important OCR packages are: 


— Text Bridge (ScanSoft) (http://www.scansoft.com/). 
— OmniPage (Caere) (http://www.scansoft.com/). 


3.3.12. Database Management Software 


The database management software provide structured 
storage and retrieval facilities to the contents of a digital library. 
Digital libraries use a variety of database management, 
system ranging from relational and extended relational 
database management systems to object-oriented database 
systems. Relational DBMS are most often used for the storage 
of metadata and indices with attributes that contain pointers 
to files in a file system. Most of the commercial RDBMS also 
support storage of binary large objects (BLOBs). Object- 
oriented database systems are slowly gathering acceptance. 


The relational DBMS software listed below can be 
accessed by using SQL (Structured Query Language): 


Digital Libraries Components and Services 199 


— Oracle (http://www.oracle.com/) 
— Informix (http://www. informix.com/) 
— Sybase (http://www.sybase.com/) 


— SQL Server (http://Awww.microsoft.com/) 


3.3.13. Right Management Software 


Right management software control and monitor access 
to contents of a digital library. The software listed below also 
provide protection from unauthorized access and 
manipulation of data: 


InterTrust Systems Developer's Kit (InterTrust, Inc.) 
(http://intertrust.com/). 


3.3.14. Client-side Hardware and Software Components 


Clients are the machines that reside on the user desks. 

The planners of the digital library, therefore, need to prescribe 

the minimum level of hardware and software that a user would 

require so as to achieve efficient and effective interaction with 
Table 3.3. Plugins Software 


Software Used For Website 


Internet Explorer 5.0 Internet Browser _http://www.microsoft.com/ 


Netscape Navigator Internet Browser _hittp://home.rietscape.com/ 
6.0 


Acrobat Reader 5.0 PDF files http://www.adobe.com/ 

(Adobe) 

Real Player 7.0 Audio and Video __http://www.real.com/ 

WS_FTP Pro 6.0 File Transfer Client http://Awww.ipswitch.com/ 

Microsoft Office For display and http://www.microsoft.com/ 
printing of MS 


Word, MS Access 


TIFF Viewer TIFF Images http://www.alternatiff.com/ 


200 Manual of Digital Libraries 


the digital library. Most digital libraries require an Internet- 
enabled multimedia PC or Machintosh equipped with an 
Internet Browser like Internet Explorer or Netscape Navigator 
as their clients. The client-side PCs may also require the 
software packages (plug-ins) as given in table 3.3 to download 
format-specific deliverables from a digital library: 


3.4. DIGITAL RESOURCE ORGANIZATION 


Digital contents in a digital library may include a 
combination of structured / unstructured text, numeric data, 
scanned images and other multimedia objects. These digital 
objects need to be organized and made accessible to the 
user community. As digital libraries are built around Web and 
Internet Technology, it uses objects and addressing protocols 
of the Internet. 


3.4.1. Object Naming and Addressing 


Uniform Resource Locators (URLs) describe the 
location of an object on the Internet and the protocol used to 
access it. URL is a pointer or an address to information 
available on the Internet, be ita document on the web, a file 
on FTP server or Gopher, a posting on USENET or an e-mail 
address. For example: 

Protocole _http://www.sciencedirect.com+——————.-. Name 
Protocola———_ gopher://gopher.tc.umn.edu <—————-_ Name 
‘Protocole ftp://oak.oakland.edu <———_—————-._ Name 
Protocole _ ntailto://jarora@library. iitd. ernet. inc——- Name 
T__Network 
T Country 

The URL provides a universal method for finding and 
accessing information on the web. They are the links between 
sites or pages on the web that can be hyperlinked to provide 
the navigational functionality of the web. The current method 
of identifying objects on the Internet, i.e., URL consists of 


Digital Libraries Components and Services 201 


protocol with which a document is accessed (e.g. HTTP), a 
machine name and document path and a document file name 
which may or may not be unique. URLs are not persistent 
because whenever a file is moved, the documentis often lost 
entirely. A global scheme of unique identifiers is, therefore, 
required that is persistent and is not tied to specific locations 
or processes. These names must remain valid whenever 
documents are moved from one location to another, or are 
migrated from one storage medium to another. Three 
schemes proposed to resolve the problem of persistent 
naming are PURLs, URNs and Digital Object Identifiers. 


PURLs or persistent URLs are the scheme developed 
by OCLC in an attempt to separate a document name from 
its location and, therefore, increase the probability that it will 
always be found. PURLs work through a mapping of a unique, 
never-changing PURL to an actual URL. If a document moves, 
the URL is updated, but the PURL stays the same. In 
operation, a user requests a document through a PURL, a 
PURL server looks up the corresponding URL in a database, 
and then the URL is used to pass the document to the user. 


PURLs look like URLs. A typical PURL is shown below 
in Fig. 3.5 


http://purl.oclc.org/OCLC/PURL/FAQ 
4 4 4 


Protocol Resolver Address name 
Fig. 3.5. Example of a Typical PURL 


Functionally, a PURL is a URL, however, instead of 
pointing directly to the location of an Internet resource, a PURL 
points to an intermediate resolution service. The PURL 
resolution service associates the PURL with the actual URL 
and returns that URL to the client. The client can then 
complete the URL transaction in the normal fashion. In Web 
parlance, this is a standard HTTP “redirect”. 


202 Manual of Digital Libraries 


PURL 
SERVER 


URL 


soma = BFL RESOURCE 
fies | SERVER 
RESOURCE hircarresnnnsn imaa mnnne = 


Fig. 3.6. Resolver Associates with PURL Server 


Resolver associates PURL with unique URL, 
maintenance utilities facilitate creation of PURLs and 
modification of associated URLs which is shown in Fig. 3.6. 
The OCLC has launched its PURL Service since the beginning 


of the year 2—3 at http:// purl.oclc.org. 


Uniform Resource Name (URN) is a development of 
the Internet Engineering Task Force (IETF). A URN is not a 
naming scheme in itself, but a framework for defining 
identifiers. They contain a naming authority identifier (a central 
authority given the task of assigning identifiers) and an object 
identifier (assigned by the central authority). Like PURLs, 
URNs must be resolved, through a database or other such 
system, into actual URLs. Unlike PURLs, however, a URN 
can be resolved into more than one URL, such as one for 
each of several different formats. 


Digital Object Identifier (DOI) is an initiative of the 
Association of American Publishers and the Corporation for 
National Research Initiatives (CNRI) designed to provide a 
method by which digital objects can be reliably identified and 
accessed. The CNRI Handle system, which underlies DOI, 
is a system that resolves digital identifiers into the information 
required to locate and access a digital object. The main 
impetus of the DOI system is to provide publishers with a 
method by which the intellectual property right issues 
associated with their materials can be managed. 


Digital Libraries Components and Services 203 


The DOI System consists of the following four 
components: 


(i) Enumeration : assigning an alphanumeric string to the 
digital object that the DOI identifies. 


(ii) Description : creating a description (“metadata”) of the 
entity that has been identified with a DOI. 


(iii) Resolution : making the identifier “actionable” by 
providing information about what the DOI should resolve 
to, and the technology to deliver the services that this 
can provide to users. 


(iv) Policies : the rules that govern the operation of the 
system. 


These components are dspicted in Fig. 3.7. 


3.4.2. Online Database Connectivity (ODBC) 


HTML, the de factolanguage of the web, does not offer 
much interactivity essential in a digital library implementation. 
Restrictions inherent in the current WWW protocol are 
overcome by employing back-end databases to retrieve 
information in response to queries from users or queries 
imbedded as links. The methods used for providing 
connectivity to a database with the web are described below: 


ODBC Drivers : Online database connectivity provides 
a uniform way to access databases for which compliant drivers 
have been written. The ODBC drivers are readily available 
for well-known relational database systems. Working with 
ODBC gives access to the widest variety of systems. The 
technology is now available, to create web pages on-the-fly 
to generate data dynamically obtained from a database like 
Oracle 7, Microsoft Access, Inmagic, BASIC+, etc. ODBC 
drivers for important databases are already in-built in major 
operating systems. 


204 Manual of Digital Libraries 


Enumeration 
Allocation of 
an. Identifier 


Description 
Describe 
Digital 


Resolution 3 
Handle System allows DO}? 
to resolve anyipiece-of 
Resource currentdata onthe Internet, | 


1h KASE 


Policies 
Fig. 3.7. Components of Digital Object Identifier 

Common Gateway Interface (CGI) : A CGI script is an 
extremely powerful feature to achieve web browser and server 
interaction. CGI script is a program that runs on a web server 
and it serves as a link between the web server and some 
other program running on the server. The web interfaces can 
now be designed to query databases, receive results and 
display them to the users. The program is held on the web 
server and activated within the web page in conjunction with 
HTML “forms” for query formulations by using CGI scripts. 
The HTML forms, filled-in by a digital library user, generate a 
query for the library server. The results obtained from the 
Library Server are displayed once again with the help of CGI 
scripts which generate an HTML page for displaying the 
results at the user’s end. Almost any programming language 
can be used to write CGI scripts as long as it is supported by 
the server. Perl, C / C++ and UNIX Shell programming are 
some of the popular scripting languages. WWWISIS is a CGI 
script that is used for web-enabling of CDS/ISIS databases. 


Java Database Connectivity (JDBC) : Java applets are 


Digital Libraries Components and Services 205 


not used just for creating visual effects and animations rather 
database access is another popular application where Java 
applets are used. Java applets are client-end applications 
that get automatically transmitted to the client’s PC over the 
Internet. Java Database Connectivity (JDBC) is a 
programming interface which allows applets to communicate 
with database systems through Servlets in a three-tier 
architecture. The applets interact with the server, which in 
turn, interact with the database layer. In response, the server 
receives data from the database layer and sends the results 
back to the client. Java is very significant since it makes the 
WWW truly interactive by incorporating epplications that can 
be programmed, run online and distributed in a simple, safe 
and portable manner. Java also provides an extensible 
method to handle internally new data types and protocols. 
Libsys has a Java client to offer web-based connectivity to 
Libsys. 


3.4.3. Uniform Resource Characteristics 


The digital objects need to be identified, described, 
stored and disseminated so as to serve their purpose. Stored 
in digital repositories, digital objects must have their unique 
identifications or names that can be used for their retrieval. 
Uniform Resource Characteristics (URC) or meta data, as 
more popularly known, provide meta data or meta information 
about an object, which is analogous to bibliographic records. 
In other words, meta data is information about information 
available on the web. The following three types of metadata 
are associated with the digital objects: 


e Descriptive Metadata : These include content or 
bibliographic description consisting of keywords and 
subject descriptors. 


6 Administrative or {echnical Metadata : These 
incorporates details on original source, date of creation, 


206 Manual of Digital Libraries 


version of digital object, file format used, compression 
technology used, object relationship, etc. Administrative 
data may reside within or outside the digital object and 
are required for long-term collection management to 
ensure longevity of digital collection. 


° Structural Metadata : These are the elements within 
digital objects that facilitate navigation, e.g., table of 
contents, index at issue level or volume level, page 
turning in an electronic book, etc. 


The OCLC / NCSA Meta Data workshop in Dublin, Ohio 
held in 1995 proposed the core set of elements to appear in 
URC: Title, Creator, Subject and Keywords, Description, 
Publisher, Contributor, Date, Resource Type, Format, 
Resource Identifier, Source, Language, Relation, Coverage 
and Rights Management. 


Increasing size and complexity of the digital information 
available on the web demands for methods of its organization. 
Uniform and structured meta information can effectively be 
employed to achieve this goal. Meta data support efficient 
and effective organization, access and retrieval of information 
contents in a digital library. Meta information is used in 
effective designing of browsing and search interfaces of a 
digital library. 


The availability of metadata encoded in digital objects 
reduce efforts involved in extracting, storing and accessing 
key terms. Moreover, it facilitates easier and pinpointed 
access of appropriate information. Techniques have now 
been developed that support automated extraction of 
catalogue information, incorporation of annotations about 
items, developments of templates that permit users to extract 
their own meta information. 


Digital Libraries Components and Services 207 
3.5. MANPOWER TRAINING 


The digital libraries, on the one hand, are similar to 
physical libraries, they involve the same issues of selection, 
evaluation, access, housing, preservation and providing user 
assistance. However, on the other hand, digital objects are 
very different from the physical objects in many ways. So, a 
different set of knowledge and skills are required to handle 
them. Digital libraries need digital librarians to manage them. 
Since digital libraries and digital objects are new, such skills 
are hard to find. A key component of building a digital library 
is to build a team with understanding and expertize in various 
areas relevant to the digital resources. The situation warrants 
library and information science professionals to perform a 
paradigm shift to meet the challenge. The paradigm shift calls 
for introduction of these technologies in the libraries and to 
train the existing and potential library science professionals 
in their use. 


Digital libraries are amongst the most complex and 
advanced form of information systems. The digital library 
development requires in-depth knowledge of digital document 
imaging, distributed database management, hypertext, 
information retrieval, enforcement of intellectual property 
rights, integration of multimedia information services, 
management and organization of multilingual collection, 
information mining, electronic reference service, electronic 
document delivery and selective dissemination of information. 
Due to these diverse requirements, the digital libraries are 
emerging as a growing interdisciplinary area of research and 
education for information science, computer science, library 
science and a number of other related disciplines. While 
computer scientists are responsible for the technical 
development of digital libraries, the information scientists need 
to take charge of content organization, presentation, user’s 
training and retrieval of information from the digital libraries. 
The librarians and information professionals are, therefore, 


208 Manual of Digital Libraries 


required to be trained in the current technological demands 
of digital libraries. 


3.6. STEPS INVOLVED IN BUILDING DIGITAL LIBRARY 


Building digital library is highly specialized and cost- 
intensive activity that requires inputs from diverse branches 
of Knowledge. It is important that objectives, needs and 
purpose of digital library is established clearly and its 
objectives are established beyond doubts. The digital library 
proposal should define its goals, scopes, benefits, costs, time 
required in developmental phase, feasibility, implementation 
issues, deliverables and target users. It may be desirable to 
continue with traditional libraries with (i) acquiring of 
collections in digital media; (ii) buying access to electronic 
resources; and (iii) developing subject gateways or library 
portals, instead of building digital library with contents 
development activity. Acquisition of digital media, buying 
access to e-resources or developing subject gateways would 
save cost of digitization and ongoing administrative cost. 
However, once decision for creating digital library is taken, 
the due importance should be given to factors such as 
sustenance, reusability, interoperability, verification and 
documentation both for users as well as for developer. The 
steps given below may be considered as pre-requisites: 


3.6.1. Planning Digital Library 


Careful planning of digital library would bring clarity to 
the project, save cost, time and human expertize. Planning a 
digital library may further include the following steps: 


Feasibility : It is important to conduct a feasibility study 
of the digital library project. The feasibility should be 
established not only in terms of existence of tools, expertize, 
volume / number of documents involved in the process of 
digitization, but also in terms of target audience, demand for 


Digital Libraries Components and Services 209 


material being digitized and user’s requirement. The feasibility 
study should also reflect whether the library can take-up the 
project in-house or should it be out-sources. 


Network and Computing Infrastructure Requirements: 
Requirement of creating and hosting digital library in terms 
of server-side hardware and network components and server- 
side software components may be planned with their financial 
implications. Connectivity and bandwidth required for hosting 
digital library need to be planned. 


Human Resources Planning : Human resources need 
to be planned in terms of staff time involved, training of existing 
staff and recruitment of new staff with desired skills. Human 
resources planning would depend on whether the library is 
going for in-house digitization or for outsourcing the process 
of digitization. Project management continues to be an 
important issue even if the digitization work is being out- 
sourced. The management of the project may be divided in 
groups with their responsibilities defined. Communication 
between groups and a reporting structure may be laid down 
to facilitate open channels of communication and form the 
basis for sound formative evaluation of the project. 


Managerial Planning : This would essentially involve 
the process of sequencing various tasks their time 
management and project monitoring. Activities that need 
managerial planning may include conducting feasibility study, 
procurement of equipment, recruitment of manpower, 
digitization, whether out-sources or to be done in-house, IPR 
and rights management issues, integration and organization 
of content, finding market, launching and marketing of 
services. Flow diagrams, PERT, CPM and SWOT analysis 
and other management techniques may be deployed at this 
stage. 


Financial Planning : Financial planning is very crucial. 
Cost of migration from one media to other and from one 


210 Manual of Digital Libraries 


computer to other may be built-in. Cost of hosting the services 
and their maintenance should also be planned besides other 
aspects mentioned above. 


3.6.2. Selection of Material for Digitization and “Born Digital” 


Identify, select and prioritized documents that are to 
be digitized. If your organization itself is generating contents, 
lay-down strategies to capture “born digital” data. If documents 
are available in digital form, it can be easily converted into 
other formats. If the selected material is from external sources, 
intellectual property rights issues need to be resolved. lt is 
important to obtain permission from the publishers and data 
suppliers for digitization, if material being digitized is not 
available in public-domain. Moreover, decision may be taken 
whether to OCR the digitized images. Documents selected 
for digitization may already be available in digital format. It is 
always economical to buy e-media, if available, than their 
conversion. Moreover, over-sized material, deteriorating 
collections, bound volumes of journals, manuscripts etc. 
would require highly specialized equipment and highly skilled 
manpower. 


3.6.3. Technical Specifications 


The technical specifications should be decided much 
before the actual process of digitization commence 
irrespective of the fact whether digitization is being done in- 
house or it is being out-sourced. Begin with reviewing the 
existing types, formats, standards and practices. Prepare draft 
specifications and test these specifications through 
demonstrations. Re-draft specifications in light of experience. 
Lay-down specifications for metadata creation for digital 
objects as well as for the digital collection. Digital objects and 
digital collection typically require descriptive, structural and 
administrative metadata. 


Digital Libraries Components and Services 211 
3.6.4. Implementation 


Once decision on where to start, how to start, whom to 
use for getting the job done, is taken, it is time to implement 
digital library project. The implementation include following 
steps. 


Purchase of Hardware and Software : Choice of 
technology and kinds of equipments required may be 
purchased. Data storage and backup devices may be 
purchased as required. Software for search, access and rights 
management may also be acquired or developed. 


— Acquire and install Hardware and Software. 


— Acquire and install network required for digital library. 
Consider bandwidth requirements that depends upon 
media offered by the digital library. While simple text 
require least bandwidth to deliver content, images and 
video require large bandwidth. 


— Acquire and install other components as required. 


Contents Creation / Building : Steps involved here 
include: 


— Conversion of datasets that are “born digital.” 
— Conversion of existing print media into digital format. 


— Identify venders if digitization is to be done to different 
sources. 


Employ Manpower and Train Existing Manpower : 
Already trained persons from outside may be hired for 
implementation of the programme. Otherwise firstly training 
may be given to the staff, then implementation can be done. 


Digital Resource Organization and Access 
Infrastructure : Digital media, created in the process 
mentioned above, need to be organized with an aim to make 
the collection more usable and accessible and facilitate value- 


212 Manual of Digital Libraries 


added services such as search and discovery utilities, browse 
and interpretive interfaces. 


3.6.5. Testing and Integration 


Successful deployment of digital library services 
requires elaborate testing of services and integration of 
components. Testing and integration may go through a pilot 
test first. A detailed documentation is required to support the 
needs of collection managers on various implementation 
issues. Access and delivery mechanisms should also be 
tested involving active user group. 


3.7. DISSEMINATION: EXTENDING DIGITAL LIBRARY 
SERVICES 


On successful completion of the testing and integration 
phase, it is the time to launch the digital library services. Digital 
libraries are seen as systems that incorporate not only digital 
contents, but also many value-added services ranging from 
search and discovery utilities, to browse and interpretive 
interfaces, to specialized preservation and dissemination 
protocols. These value-added services make facilitate 
management of the digital content and make it more usable 
and accessible to end users. Marketing the products and 
services of digital library would involve preparation of general 
publicity material, an elaborate Web site, discussion lists for 
user groups, contact details with e-mail and telephonic 
access. Press release, special presentations in conferences 
and a launch seminar may also be organized to release 
services and products of digital library. Special Web-based 
tutorials and instructional programs may be designed for users 
and information providers. Paper-based and Web-based 
manuals and documentation are also required to support the 

needs of end users for resource discovery and optimal use. 


Issue related to authentication of users also required 
to be resolved. Although the IP-enabled authentication is most 


Digital Libraries Components and Services 213 


suitable for users as well as publishers, other authentication 
method may also be adopted, if required. The digital library 
services for the librarians may include mechanisms to link 
catalogue records to the digital full-text and to link electronic 
databases to library holdings. Digital library interface may also 
provide services like “My Profile”, “My Journals,” “My Alert”, 
etc. 


Fundamentaly, Digital Library Services include — 
Internet Services, Newspaper Clipping Services, Digitization 
of Special Collections, Service Examination Cell, Digital 
Reference Services, Enquiry Service Sections and Inter 
Library Loan. 


3.7.1. Internet Services 


The use of Internet in the libraries is rapidly increasing 
and is changing the traditional functions and services of the 
libraries as well as role of the Libraries. Internet is being used 
as an efficient medium for accessing, storing and 
disseminating of information world wide. 


Inter Library 
Loan Services 


Enquiry Service 
Sections 
Digital Reference 
Services 


Fig. 3.8. Services of Digital Library 


Internet 
Services 


Newspaper 
Clipping Services 


Digitization of 
Special Collections 


LIBRARY 


Service 
Examination 
Cell 


214 Manual of Digital Libraries 


Internet is also being used as a very important tool to 
access free information sources and services. Free online 
information sources and services can be assess on internet 
without paying any subscription charges to the publishers and 
users do not need to have any membership in the 
organization. Users can access and download required 
information available on these sources by using their 
computers. 


The librarian can list out the free information sources 
and services in the Internet service library. Librarian should 
maintain an index of online E-Journals, E-Books and other 
information sources, which are freely available on the Internet 
in the library to save space and money of the library and time 
of their users in searching the information. 


3.7.2. Newspaper Clipping Services 


Mohinder Singh says, “to continue the tradition of 
collecting and providing access to newspaper clippings in the 
networked environment, the DESIDOC embarked upon the 
digitization of newspaper clipping project in 2000. They are 
providing Newspaper Clipping Service both in paper and 
electronic form which includes seventeen national 
Newspapers in Hindi and English”. The users interested in 
newspaper information can get from these archives because 
of the librarian’s efforts of selection, cutting, storing and 
indexing the articles. 


For providing these services, the Librarian’s can go 
through all the Newspapers and mark relevant articles 
everyday. These selected articles are scanned and saved 
for developing database software in a required format 
indicating the name of the newspaper and publication date 
and after this process the Newspaper goes to maintain a 
proper marked file. 


Then the Librarian can go through the online 


Digital Libraries Components and Services 215 


INTERNET 


Encyclo 
paedias 


CE 


Main Server 


User Terminals 
Fig. 3.9. Internet Services 
newspapers and select the relevant articles and saved for a 
relevant software development system. That is one way of 
newspaper clipping collection is developed. For access and 


Browsing the 
Newspaper 


Selection Scanning of Saving the 
of Related Newspaper selected as 
News Clippings database 


ms mea ua m3} Mc 


User Terminals 


Fig. 3.10. Newspaper Clipping Services 


216 Manual of Digital Libraries 


retrieval the Newspaper clipping service provides browsing 
and searching facility in various fields for the users like 
Newspaper name wise, title, author, publication date etc. The 
Libraries arena offers a unique challenge to an emerging 
breed of Librarians, to combine principles, practices and tool 
of information management to create new information product 
and service. 


3.7.3. Digitization of Special Collections 


Special collection forms the core of the primary research 
collection of a library. Whatever may be the format, the 
common features of special collection are that they are 
unique, rare, and require special handling and in limited 
availability or just some thing that cannot be located in the 
open stacks. Equally important is that the material is valued 
as an artifact. Special collection may use digitization as a 
preservation technique but its real value is related to the 
increased access to special collections. Almost all major 
libraries of the world are having digitization programme to 
provide greater access to its special collections through the 
digitization of material including Photographs, Maps, Pictures, 
Cartoons, etc. 


There is a growing number of collaborative digitization 
initiatives within major libraries including the British Library 
that already has an estimated four terabytes of digitized 
information. Special collections of the British Library are 
digitized to maximize their use by facilitation a greater volume 
of net worked access and by providing the enhanced 
functionality intrinsic to the digitized items. 


A number of digitization projects have also been 
undertaken by the National Library of Australia for long-term 
preservation of traditional documentary materials. Digitization 
provides access to electronic information resources such as 
image, database, recordings, theses, manuscripts and full- 
text of periodical articles index by the library. 


Digital Libraries Components and Services 217 


Desser- 
tations 


Others 


Main Server 
User Terminal Making Cd's Printouts 


Fig. 3.11. Digitization of Special Collections 


Service Exam Cell 


Service Exam Group Study 
Source Documents Cell 


Fig. 3.12. Service Examination Cell. 


Model Exam 
Cell 


Special collection — including photos, cartoons, 
drawings, negatives, postcards, maps, printed music, etc., 
of the National Library of Australia has been digitized under 


218 Manual of Digital Libraries 


a major digitization programme aiming at providing greater 
access to its collection. 


3.7.4. Service Examination Cell 


Librarian’s can arrange for the user motivation program 
and they have to separately arrange program for “service 
examination cell”. That cell may have service examination 
documents for example UGC, GATE, GMAT, IES, IAS, IFS, 
CAT etc., Arrangement for “Group studies Cell” can also be 
made. They may also conduct “how to prepare the service 
examinations and how to write the service examinations”? It 
is more useful for the out going students in the university 
system. 


3.7.5. Digital Reference Services 


The library and information profession is also facing 
the challenges of electronic age and been transformed by 
technology. Advancements of information technologies have 
brought out incredible changes in almost every aspect of 
information services. Reference services are also not an 
exception. Easily accessible digital information has rapidly 
become one of the hallmarks of Internet. Internet has also 
proved as a cost effective and efficient alternative to traditional 
communication methods. All these developments gave way 
to new range of reference services. In these services of 
developments, digital reference is the latest trend in the digital 
era. 


Dr. S.R.Ranganathan, defines Reference Service as 
“Personal service to each reader in helping him to find the 
documents answering his interest at the movement pin- 
pointedly, exhaustively and expeditiously”. Today many 
academic libraries and information centers are extensively 
using e-mail facility to provide online reference service to the 
users. E-Mail reference has the advantage of providing more 
complete answers than what were not possible to be given at 


Digital Libraries Components and Services 219 


a busy reference desk. When answering a question through 
e-mail, the reference librarian usually has more time to think 
about the question, the users information needs and if 
necessary consults with other colleagues who have more 
related expertize or knowledge. 


However, the users may ask a question for reference 
through telephone, at that time that information is readily 
available with librarian in data base. The users can directly 
come and get the relevant documents for the reference. 


DIGITAL REFERENCE SERVICE SECTION 


i 
EACEA 


Reference Documents 


ms 


Reference Librarian with Online 


Online E-mail Chat based Virtual 
Enquiry Service Enquiry Service Enquiry Service Ref. Service 


Fig. 3.13. Digital Reference Services 


Telephone 
Enquiry Service 


Chat Based Reference is also here. Chat, also referred 
to as instant messaging, is real-time communication between 
two or more computer users over the Internet. Every keystroke 
a chat user make is instantly transmitted and appears on the 
monitors of all other users in the same chat session. Chat is 
a very popular means of communication over the Internet. 


Real-time reference live on the web is the latest trend 
in virtual reference. Already some librarians are providing live 
web reference services to their users. 


The librarian and information professionals are required 
to cope up with the new technological changes, but at the 
same time one should not fear that the new emerging 
technology based services will replace the traditional services 


220 Manual of Digital Libraries 


completely rather these are emerging as supplementary 
services to improve the information dissemination amongst 
the student community. 


3.6.6. Enquiry Service Sections : The librarians can 
arrange enquiry service section. They need no separate place 
rather they can utilize the circulation section. Otherwise, the 
librarian can arrange separate system for user terminal. Both 
the ways are very useful to locate the documents to the user 
community. 


Circulation Section 
Enquiry Section 


Computer 
Searching 


Fig. 3.14. Enquiry Service 


3.6.7. Inter Library Loan Services: In this, facilities can 
be arranged by the librarians. The costly documents, rare 
collecting documents, government reports, abstracts, patents 
etc., these are all put in to one place. The documents which 
are not available in other libraries, especially in university 
affiliated colleges or small libraries in public library system, 
may be called through inter library loan from that library. These 
documents can be sent as e-mail attachment after scanning 
them in digital environment. 


The aim of almost all the digital technology projects 
mentioned above has been to enhance access to rare and 
unique primary materials that may otherwise have limited 


Digital Libraries Components and Services 221 


Fig. 3.15. Inter Library Loan Service 

scholarly and educational importance at their geographical 
location, to aid in the preservation of these special materials 
through the creation of digital surrogates, and to create 
learning opportunities for remote users, including those in 
colleges and universities, schools and to the users of distance 
learning. For this, we need a well designed digital library 
development. The designing a digital library is not an all of a 
sudden process, but it involves various processes. 


3.8. ARCHITECTURES AND INTERCONNECTING 


Since the field of digital libraries is young, there still is 
active investigation regarding architecture, interconnection, 
and interoperability. Fig. 3.16 shows one, rather high-level, 
decomposition of a digital library into components. 


Most commonly used architecture for digital library is 
four-layered architecture, which is shown in Fig. 3.17. The 
topmost layer is the service layer which is responsible for 
providing information service to the end users. This provides 
an user-friendly interface so that the user can select his 
required information by a click of a button. The second layer 
contains the software tools required for providing user 
interface, querying the digital repository, making links to 


222 Manual of Digital Libraries 


related information and providing the output in desired 
formats. The third layer is the backbone of the digital library 
which contains the digital information which is in the form of 
metadata contained in databases and the digital information 
in the form of files. The bottom most layer is an important 
one which provides the tools and protocols for interoperability. 


Identification/Selection 
of documents 
Scanning/ Formatting 
| Data Conversion | 


<t—| Copy Right 


User Requirement 


A 


Identification of Digital Docs. 
Metadata/Li ` z | Archives | 
etadata/ Links rchives 


User: Interface 
Development 


Fig. 3.16. Digital Library Design 


A digital library should be interoperable. To be 
interoperable means one should actively be engaged in the 
ongoing process of ensuring that the systems, procedures 
and culture of an organization are managed in such a way as 
to maximize opportunities for exchange and re-use of 
information, whether internally or externally. Hence in a Digital 
Library environment, interoperability is the ability of 


Digital Libraries Components and Services 223 


information systems to operate in conjunction with each other 
encompassing communication protocols, hardware, software 
application and data compatibility layers. 


ae 
; User Interfaces 
Services for their research 
EAEE AENA 
— = Ta 
Í Searchware that 
mediates access to 
Tools T> digital information 
pe 
J 


re —— 
Digital Repository 
e Databases 
e Files 


Information base 


eat 


ae = 
/ Digital Library 
{ Interoperability 


S Protocol / 


Se A 


ee Rae 


Fig. 3.17. Four-layered Architecture of Digital Library 


Interoperability is a critical problem in the network 
environment especially, the Digital Libraries with increase in 
number of diverse computer systems, software applications, 
file formats, information resources and users. Interoperability 
is the ability of digital library components and services to be 
functionally and logically interchangeable by virtue of their 


224 Manual of Digital Libraries 


having been implemented in accordance with a set of well- 
defined publicly known interfaces. In this model different 
services and components can communicate with each other 
through open interfaces, and clients can interact with them in 
an equivalent manner. The ultimate goal of interoperability is 
to create and develop components of a digital library 
independently yet be able to call on one another efficiently 
and conveniently. 


Interoperability in digital library implementation 
addresses the challenges of creating a general framework 
for information access and integration across many domains. 
Digital library created using principles of interoperability result 
in repositories of digital contents which may have different 
attributes but can be treated in the same manner due to their 
shared interface definition. There are several approaches to 
achieve interoperability in digital library implementation. Some 
of the common approaches are: 


(i) Standardization, e.g. (Schema definition, data models 
and protocols), 


(ii) Distributed object request architectures (e.g. CORBA), 
(iii) Remote procedure calls, 
(iv) Mediation (e.g. gateways, wrappers), and 


(v) Mobile computing (e.g. Java Applets). 


3.8.1. Standardization 


Standardization is a proven approach to achieve 
interoperability. MARC and its different varients, CCF and 
Dublin core are some of the known standards for bibliographic 
description of records. Z39.50 is a known standard for 
information retrieval. “Technical infrastructure of a digital 
library” deals with standards and protocols on various aspects 


of digital library. 


Digital Libraries Components and Services 225 
3.8.2. Families of Standards 


The families of standards approach offers the choice 
of implementing one or more of several standards. 
International Standardization Organization (ISO) standard for 
Open Systems Interconnection (OSI) has creaied an 
interoperability framework based on the family of standards 
approach. OSI in its seven layers structure provides a family 
of standards concerned with a given set of interoperability 
issues in the area of interconnection. 


Client Side Server Side 
peace Network _—. 
í amens Boundary “ Information D 
i 


Client Application | Dea \ Source ey 
- ~ ia 
ae / 
We 
y 


Interoperability 
Protocol Interface 


Interoperability | ~ Library Service 
Protocol Interface | = iProx iA 


| Client Transpor Module | 


Zs 


-—_-— -n 


| Server Transport Module 
< CORBA > L___--__—- 
\ ITT? ete / 
ee, | 


Fig. 3.18. Digital Library Interoperability 


3.8.3. Specification-based Interaction 


Interoperability can also be achieved by describing the 
semantics and structure of all data and operations. The 
specification-based interaction circumvent the requirement 
of mediation systems. Some of the well-developed enabling 


226 Manual of Digital Libraries 


technologies to achieve this goal include Agent 
Communication Language (ACL). 


3.8.4. Mediation 

Interoperability can also be achieved by deploying 
mediation machinery and interfaces for translation of data 
formats and interaction modes between components. In the 
area of interconnection of diverse networks, network 
gateways play the role of mediation. However, translations 
in the sense of simple mapping is not always sufficient to 
achieve complete interoperability. For example, two sets of 
digital libraries may sometime completely lack certain data 
types or operations and, therefore, cannot interoperate 
without further work. However, mediation interfaces can be 
designed to augment functionalities and services that may 
search two digital libraries and present the results with its 
own value-additions. Such mediation facilities are called 
“wrappers” or “proxies”. Mediation technology thrives on 
standardization. For example a single mediation system can 
cover all Z39.50 compliance sources at once. 


3.8.5. Mobile Computing 


Mobile computing or functionalities consists of software 
agent that travel over the network to sites where they access 
the service that they need. These software agents reach back 
to their original sites with the results of their works. Java 
applets and servlets facilitates such mobile functionalities that 
deliver new capabilities to client components at run time. 
Instead of depending upon standardization or third-party 
mediation, mobile functionality accomplishes interoperability 
by exchanging codes that facilitates communication amongst 
components. 

The key goals of the interoperability protocol are to 
make it very easy to build clients, and to construct Library 
Service Proxies (LSPs) that wrap arbitrary sources. 
Implementers of this protocol need to produce only the client 


Digital Libraries Components and Services 227 


application and/or the library service proxy. Everything else 
is taken care of by standard libraries. It is important to note 
here that client applications and services need not be aware 
of the methods used for transporting operation requests and 
replies. The transmission of requests and replies might be 
accomplished through different ‘transport bindings’: CORBA, 
HTTP,.or, maybe in the future, some other means. Client 
applications are unaffected by the transport binding. A client 
application merely creates a client transport module object in 
its local address space. This module implements the 
Interoperability protocol interface. The client then invokes the 
Interoperability protocol operations on this local module. The 
module pack the operations for transport via one of the 
supported Interoperability protocol transport bindings. Any 
given client transport module instance uses one particular 
transport binding. If a different binding is to be used, the client 
application simply instantiates a different class of transport 
module. * 


But interoperability becomes more critical problem in 
Indian digital libraries, with those having much differences. It 
has another sharing problem of resources from one language 
to another as resources at Indian libraries are present in many 
Indian languages viz. English, Hindi, Sanskrit, Marathi, 
Gujarati, Oriya, Bengali, Punjabi etc. Thus, it has problem of 
interoperability between multilingual digital library resources. 
However there are so many true type fonts are being used to 
represent the Indian languages on web. 


But that is not sufficient tool to implement the 
multilingual. ISCII is also being used as a standard to 
represent the Indian languages on the web as well on the 
database part. But as we found that Unicode is the only 
solution, which represent the all languages being spoken in 
this world including the Indian languages. It does not have 
any ambiguity and does not overlap, dealt with each and every 
characters of every language with unique values. 


228 Manual of Digital Libraries 


Lets have a look on some hybrid library services. 


3.9. HYBRID LIBRARY SERVICES 


The original concept of a digital or so called virtual library 
evolved from a view that electronic information storage and 
access was the full scope of service for the new library. The 
original perspectives of a service oriented to files and 
‘resource discovery’ based on file naming and retrieval 
algorithms. The role of reference in digital or virtual libraries 
can be explored, based on both the original concepts and 
taking into account lessons learnt for reference in the printed 
library and the digital. The early concepts of a digital library 
were based on a ‘seamless’ automated access to information 
without traditional reference services. 


The purposes of a hybrid virtual library system are: 


— to expedite the systematic development of: the means 
to collect, store, and organize information and 
knowledge in digital form; and of digital library 
collections in North America; 


— to promote the economical and efficient delivery of 
information to all sectors; 


— to encourage co-operative efforts which leverage the 
considerable investment in research resources, 
computing and communications network; 


— to strengthen communication and collaboration 
between and among the research, business, 
government, and educational communities; 


—  totake an international leadership role in the generation 
and dissemination of knowledge in areas of strategic 
importance; and 

— to contribute to the lifelong learning opportunities of all 
users. 


i 


Digital Libraries Components and Services 229 


Hybrid libraries can also be seen as located on a 
continuum between the conventional and digital library, where 
electronic and paper-based information sources are used 
alongside each other. The challenge associated with the 
management of the hybrid library is to encourage end-user 
resource discovery and information use, in a variety of formats 
and from a number of local and remote sources, in a 
seamlessly integrated way. The hybrid library should be 
“designed to bring a range of technologies from different 
sources together in the context of a working library, and also 
to begin to explore integrated systems and services in both 
the electronic and print environments.” 


The hybrid library should not, then, be seen as nothing 
more than an uneasy transitional phase between the 
conventional library and digital library but, rather, as a 
worthwhile model in its own right, which can be usefully 
developed and improved. This model contains the natural 
reference librarian for the print world, allows for the existence 
of a ‘Cybrarian’ for the electronic environment but does not 
yet have a multi-skilled reference service at its core. Some of 
the definitions which appeared after this evolution in thinking 
include Paul Duguid who described a digital library as “an 
environment to bring together collections, services, and 
people in support of the full life cycle of creation, 
dissemination, use, and preservation of data, information and 
knowledge”. 


A full understanding of the role of reference is emerging 
now but a well-developed service model is still to be 
articulated. The concepts embedded in the hybrid library and 
emerging knowledge management field offer the greatest 
insights at present. To explore the nature of reference 
librarianship in the hybrid library explores two perspectives— 
that of the users or clients and that of the library. The 
differences and similarities in the expectations of these two 


230 Manual of Digital Libraries 


groups give an indication of how reference services can 
operate in the digital environment. 


Three online services—electronic reference enquiry 
approaches within the library and two collaborative 
approaches are then assessed against the expectations and 
as possible service models. 


3.9.1. Digital Reference 


Digital reference means the mediated, one-on-one 
service that intervenes—and stands ready to intervene—at the 
information seeker’s point of need. That need is the one that 
Brenda Dervin’s research has taught us about: every 
information seeker’s universal predicament of wanting to 
move forward but being unable to progress until some missing 
information is found. Information seekers want that gap filled 
with as little interruption as possible so they can continue 
where they left off. It is a need that would not go away. From 
the library's standpoint, there are two sides to this problem. 
One is how to ensure that clients who use a reference service 
get up-to-date assistance that integrates paper and electronic 
resources. The other side involves how to reach the user who 
has a question but no obvious place to ask it. 


Certainly reference is in flux, but the emphasis is not 
on answering the stuck client's question but on teaching in 
groups and in-depth consultation by appointment. The irony 
is that at a time when we are in a position to provide the ideal 
in reference service, we are making moves to abandon the 
service. It is as if there were an underlying premise at work 
that if we teach the new information literacy skills, our clientele 
would not have questions. Of course, it is more complicated 
than that, but the signs are pointing to a trend in mediated 
reference service forecasting a future that, if put on a graph, 
would be a downward slope to oblivion. It would be important 
to begin putting our heads together to reverse the trend and 


Digital Libraries Components and Services 231 


pave the way toward ensuring top-notch reference services 
to users, whether they are in or out of their libraries. 


If you are in a library with a busy reference desk that 
deals in the main with substantive queries, you may be reading 
this with a skeptical eye. But do not let your immediate safe- 
feeling situation lull you into closing your eyes to signposts 
on the horizon. From this standpoint, “busy” does not negate 
our concerns; it just confirms how much the service is needed 
by your walk-in clients. Or perhaps you are taking longer to 
answer fewer questions in a more complex information 
landscape, or you have fewer staff at the desk than you used 
to, or your user population is not yet using digital resources, 
or maybe your library is so difficult to figure out independently 
that patrons are forced to ask you questions. 


Guides for readers, such as Pathfinders to subjects, 
information sheets on searching for types of materials such 
as journal articles, finding aids for manuscript collections and 
documentation on searching have been produced to enable 
users to research effectively. They have been a key 
component in the information literacy programmes of libraries, 
in some cases being the only part of information literacy or 
training programmes that have been available in all the hours 
that the library is open. In the networked environment libraries 
have make guides and pathfinders available online, often 
integrated with digital resource lists, such as at the National 
Library and in OCLC’s CORC project. This first stage of 
evolution of digital self-help or unmediated reference was a 
natural first step. Guides were already in electronic form, 
generally as word processing documents and had been tested 
with the user population and developed over many years. 


Mediated reference enquiries in the digital world are 
those which are expressed in electronic form. The most 
commonly used methods are e-mail or web form, now widely 
available in research libraries.Research into use of the digital 


232 Manual of Digital Libraries 


reference service indicates a significant uptake of this medium 
for transmitting reference enquiries. As an example the 
increase in electronic reference enquiries shows a consistent 
pattern. 


For unmediated reference services general web site 
statistics show a steady increase. Differentiating reference 
products from the web site in general has commenced with 
detailed report available through products such as Webirends. 
For example, the National Library’s Reference and 
Newspaper pages are in the top three directly accessed. The 
Australian newspapers and Newspapers home page are 
accessed over 10,000 times per month. 


In understanding digital reference needs of users 
librarians have been able to take account of research into 
user needs in mediated on-site reference. There are some 
major differences in the patterns of enquiries electronically, 
as well as some differences in characteristics in the networked 
environment. Perhaps the most obvious is the nature of the 
users relationship to the library. When users had to physically 
visit or write to a library they had a relatively clear 
understanding of the role of the particular library and their 
relationship with it. 


If they were studying at a campus the educational nature 
of their information needs was relatively obvious in their use 
of the academic library. In a research library the visible 
collection gave a strong presentation indicating the subject 
interest of the organization. Users visiting the library in various 
organizations were explicitly reminded of the nature and 
purpose of the library, through location, signage and 
collections. Users have experienced library services through 
a number of different facets of their life. The fact that many 
users used a wide range of libraries because of the distribution 
of collections has been well established. 


While many users visit a range of research and 


Digital Libraries Components and Services 233 


academic libraries to support their on-going information 
needs, others had different roles with various libraries. In a 
typical example a user may work in an agency and use that 
library for work related research, be studying at an academic 
institution and use that library for their study, use the National 
Library to supplement other libraries both for academic and 
work related research and use the local public library for 
recreation as well as supplementing study materials. 


Users may well in the print world have ‘belonged’— from 
the library point of view to 4 or 5 or more libraries, segmenting 
their use through visits. The concept of ‘My library’ is then a 
network of libraries, each different and potentially confusing. 
A networked environment allows a different perspective to 
be taken to library services. The clues that exist in the physical 
library to orient users and make clear the services and 
collections available are much more elusive and complex in 
the electronic environment. Users also have the opportunity 
for an immediacy of service, which means their interaction is 
effectively ‘seamless’. 


In this model reference service is on tap, clearly visible 
with a turnaround time of minutes rather than hours. For virtual 
library services clients are required to navigate around library 
web sites. The challenge of determining the details of subject, 
geographic and temporal coverage of the library's collection 
and knowledge is significant. While the organizations name 
may give some clues, particularly if the user is a student or 
on the staff of an organization, different knowledge is required 
to use each library’s web site. 


‘Seamless’ service may exist in any one library’s web 
site but for a user navigating through more than one library 
the result is very far from seamless. To assess the needs of 
users both in the networked and print library environment a 
series of focus groups were conducted at the National Library 
from July to September 2000. The three groups comprised 


234 Manual of Digital Libraries 


of post-graduate humanities students from the Australian 
National University, Petherick Reading Room Readers and 
members of the Independent Scholars Association of 
Australia. The key comments made in focus groups were that 
users wish to access a mix of print and electronic resources 
with a new reference service. Traditional mediated reference 
services answering enquires are seen by the users canvassed 
as notas critical as they have been. Rather, users expect the 
reference service to provide them with the skills and 
knowledge to be able to search effectively themselves in the 
networked environment. Users recognize that this is very 
complex and requires constant work because of the changing 
resources available. 


Additional complexity is added by the fact that individual 
libraries offer access through different interfaces, offer 
different full text and indexes and have different degrees of 
coverage of their collections in electronic catalogues. Users 
expect libraries to offer ‘one stop shop’, however they 
recognize that different resources and reference services are 
provided depending upon the nature of the library. The time 
required to retrain and reorient for each library was seen as a 
significant impediment to effective research. 


Unmediated reference services together with training 
emerged as the Cinderellas of reference service in a 
networked environment. Training, particularly for users 
inexperienced in the use of computers and also for those 
accessing new resources. Training was seen as essentially 
a face to face service in small groups, partly as this facilitates 
self paced learning and offers flexibility. Information literacy 
programmes based on either automated or online tutorials 
or rigid programmes were considered to be of only limited 
use. Unmediated online reference services such as lists of 
resources and guides to library research were seen as very 
valuable as they could be accessed anywhere, anytime. 


Digital Libraries Components and Services 235 


In summary, users expressed a desire to access 
resources, both library collections and tools to assist their 
research 24 hours per day in the situation of greatest 
convenience. Many users study late at night and wish to work 
effectively without requiring mediated reference services at 
this time. However, there does remain a need for mediated 
reference services in training and answering library research 
enquires. Counterbalancing this is the experience of the e- 
mail and webform enquires. These enquires suggest that for 
many information is now thought of as completely digital and 
freely available. 


It is expected that library staff is to provide resources 
directly. They appear to have lost an understanding of the 
library research process and the need to structure enquires 
for persistent searching. The new model of digital libraries 
incorporating information literary skills and services requires 
a long term programme to develop the knowledge of our 
users. Matching library user and library expectations for digital 
reference reveals that there are some gaps in perceptions. 
The main differences are shown in Table 3.4. 


Table 3.4. Gaps in Perceptions . 


Library User Perceptions Library Perception 

‘Belong’ to many users. See users as ‘belonging’ to that library. 
Experience information See interactions on web site 

needs anywhere at anytime. (resource access and unmediated 


reference) and in library ‘space/ hours’ 
in mediated reference. 


Not dependant on individual Structure information by details of 
products but seek aggregated each component eg. Title, publisher 
tailored access. etc. 


3.9.2. Digital Reference Models 
In the print world libraries answered reference enquires 


236 Manual of Digital Libraries 


form their collection, the knowledge of library staff and the 
network of contacts or other libraries developed over years. 
The principles of such a reference service include a 
recognition of the role or scope of each library, defined 
relationships with other libraries; knowledge of the collection 
strengths and a commensurate knowledge of the strengths 
of library staff. While these principles apply in the digital 
environment the potential scope of collections with Internet 
access is vast. 


Users may also come from communities not previously 
served by the library, but for whom Internet access has 
enabled the breaking of boundaries of geographic or 
knowledge isolation. These complex issues have led to the 
development of collaborative models of digital reference 
services. These models can build upon the distributed 
strengths of libraries and their ready connectivity. Some of 
the models that have emerged can be seen in the 
Collaborative Digital Reference Service and CORC and 
subject portals. The participants have taken part in a pilot 
project to test the provision of professional library-quality 
reference service to users any time anywhere through an 
international digital network of libraries. 


Phase one took place from February to March 2000, 
where the goals were build a system with information on the 
subject strengths and availability hours of participating 
libraries for online reference, to enable routing and 
management of reference enquiries. To help build these 
profiles all libraries, including the National Library, contributed 
some general reference enquiries and answers to indicate 
their subject strengths. A web inquiry form has been 
constructed and tested. 


In phase two, due to commence in July, further work 
was to be done with the software including more robust testing 
of automated routing of inquiries, workflow issues, statistics 


Digital Libraries Components and Services 237 


and benefits of using the service. This phase includes referring 
current inquiries asked of member libraries to other libraries. 
Administrative issues including legal issues, staff training and 
service level agreements to be resolved. 


Phase three, enabled detailed evaluation to take place. 
During this phase inquiries were taken from patrons and 
‘stress test’ model, service levels, performance, costs, 
benefits and management/governance issues were 
completed. The project aim is for participant libraries with 
branch or other local libraries share the answering of reference 
enquiries based on their areas of expertize, aiming in the long 
run for 24 hours a day 7 days a week online reference 
services. A conceptual model of the service is shown below 
in Fig. 3.19. 


3.9.3. Co-operative Online Research Catalogue (CORC) 


OCLC, the largest American supplier of bibliographic 
records, announced in 1998 that it would work on the 
development of a Co-operative Online Research Catalogue 
project. The project comprised of 2 parts—first a catalogue of 
Internet resources, to which the National Library contributed 
records from Kinetica and secondly a mechanism to share 
guides to help readers to identify Internet resources, material 
held in library collections and databases primarily based on 
subjects. 


An example of a guide that produce which would be 
usefully shared with other libraries is ‘Australian literature on 
the Internet’. This guide is of great interest to American 
universities offering course in Australian and New Zealand 
Studies. During the pilot phase the National Library tested 
three guides through the Pathfinder system. The Internet 
resources catalogue component of CORC has become a 
production system, with a new charging schedule. The 
National Library produced a report on CORC pathfinders 


238 Manual of Digital Libraries 


Members can be ata 


ee * smeren... ; national, regional, or 
Cee: Sy local tevel 
~—-@ one 
Sa ee 
a ie Ne we zy ts eee 
ze OS d * 
s 2e ? 
3 ® : bti £ Regonal 
ie e a S. ~ Uray System 
N ee 4 ore 
- Ea | WA `. @ P 
e. PGA oe 
-0 A ra 
Local Ae 
Library “— End User 


Fig. 3.19. Conceptual Model of Service 


suggesting some development directions that would improve 
the effectiveness and efficiency. 


The University of New South Wales Library also 
conducted an extensive evaluation and produced a report, 
which identified similar strengths and weaknesses in CORC. 
OCLC have indicated they will be reinvestigating pathfinders 
after the launch of the CORC database, and have developed 
a proposed list of enhancements based to a large extent on 
our recommendations. The Council of Australian State 
Libraries approved the establishment of a Working Group on 
Reference Issues at its meeting, and discussed the possible 
role of such a group. The National Library is chairing this 
group. The draft terms of reference are to identify issues in 
which a collaborative approach will result in better reference 
services in Australian libraries. Major issues include working 
towards an Australian library reference collection and virtual 
reference services. The group will also investigate 
developments such as call centre technology, and discuss 
reader research and reference needs, benchmarking and staff 
competencies. 


Its membership includes Reference Managers from all 


Digital Libraries Components and Services 239 


state and territory libraries and the National Library. Its 
priorities include collaborative work on reference tools, such 
as reader guides and Internet guides. The benefits of the 
project are to enhance access to this information though 
creation of Australian library guides to resources covering all 
states and territories and also in reduce the duplication of 
work done in each library on these guides. A model of the 
potential shape of the service follows is shown in Fig. 3.20. 


National 
Library 


Fig. 3.20. Model of Potential Shape 


This service initiative has potential to provide an 
Australian node for the Collaborative Digital Reference 
Service. It should also provide a model for sharing pathfinders 
to contribute to CORC. A concept of subject access to 
electronic information has been developed in a new form with 
the introduction subject portals or gateways. These aim to 
enable access to resources within a defined area, forming 
the next generation of tools, a step up from the resource - 
discovery tools of enthusiastic researchers or amateurs. The 
fundamental purposes of the information portals are: 


— Providing convenient and effective access to 
information resources through a single gateway. 


— Description of resources according to agreed standards 
after selection for quality and subject content. 


240 Manual of Digital Libraries 


— identification of information resources to agreed content 
guidelines. 


The National Library has been involved in the 
development of technical and policy advice and in participation 
through contribution of records. This model of unmediated 
access offers great potential for sharing resource description 
for ‘one stop shop’ subject access. Issues for future 
development include integration of access to print and 
electronic resources, promotion and sustainable funding. 


But the explosion of access to digital resources is 
however a relatively recent phenomenon. In the evolution of 
digital libraries the simple assumptions that collections of 
resources and sophisticated search engines will answer the 
worlds enquires is being replaced by new virtual service 
models in hybrid libraries. Collaboration appears to be the 
key for effective use of information services, both mediated 
and unmediated. 


In looking for future directions, forthcoming evaluations 
and consideration of how the projects could be amalgamated 
into larger combined services will be the next steps. Major 
issues including the role of the National Bibliographic 
Database, a possible National metadata repository and 
knowledge management system to create access to implicit 
knowledge need to be addressed. 


Lastly there should be integrated access interface for 
information retrieval. Digital libraries typically integrate 
multitude of resources and media types. In fact, a digital library 
may not only have multitude of resources but also multitude 
mechanism to access these resources. Most libraries having 
sizable collections in digital forms and have adopted two-fold 
strategy that include: (a) provide access to resources through 
the library catalogue wherever possible; and (b) provide 
access to electronic resources and specialized collections 
through the library home page. In cases of electronic journals, 


Digital Libraries Components and Services 241 


web access via an alphabetical listing and/or subject index of 
all titles, offers a quick and simple means of inventory and 
direct hypertext links to full text sources. Similarly, access to 
other specialized collection can also be provided through a 
set of menus that serves as rough and ready finding aid. It is 
particularly useful for institutions that have not implemented 
web-based catalogues and cannot offer hypertext links from 
a catalogue record. On the other hand, access to e-journals 
is separate from the online catalogue and other journals that 
are part of the library’s collection. Web-based catalogues can 
enable users to connect directly to the full text source via 
hypertext links in the catalogue record. 


Thus, we can provide the services to our users in hybrid 
library environment till the complete digital environment in 
achieved. 


4 
Documents and 
Resources in Digital 
Libraries 


The publication need not owned to be readily accessible 
in an electronic environment. Many sources of information 
can be made available to the user community that could not 
be available if the libraries were to acquire or purchase and 
store the same. An electronic collection can be more flexible 
and dynamic, leaving wider variety and choice of selecting 
only that part of the document that is relevant and the whole 
source need not be acquired. Location of all items in the 
machine readable form is quicker than that of the conventional 
printed shelf collection. All the libraries in developed nations 
have totally shifted to electronic environment and those of 
them in the developing countries are in the process of moving 
towards the new electronic environment to find their place 
and fit into the changing information society with all its 
limitations. 


But in digital environment, only digital information is 
disseminated and most of the information is obtained by 
remote accesses. Much of this information is less permanent 
in nature. In these circumstances it is very difficult for librarians 


to decide: 


Documents and Resources in Digital Libraries 243 


— who should do it, as most of such information is 
accessed directly by users, without bringing into the 
knowledge of librarians. 


— what standards to be followed. 


— where users located and from where to access the 
information. Information is not usually structured, no 
rules or codes are followed and no one controls the 
information that is made available. 


The raw data or information may be of different types. 
To organize the data or information, we require cataloging 
practice and it calls for an appropriate data model for 
organizing data with standard point. Specialized technologies 
are needed for compressing as well as for organizing 
information. These are the major challenges in digital library 
environment. Digital library technologies can be applied 
across all kinds of institutions. These technologies were 
almost unheard a few years ago and now the pace of 
development and application of these continues to accelerate. 


Earlier, books were an important channel for 
dissemination of information. They constituted an important 
and useful aid for the support of teaching and learning 
purposes. Still they are playing an important role in promoting 
culture, education and everyday ways of life. But one of the 
outstanding and exciting developments of 20th century is the 
publication of documents in electronic form. The electronic 
media has provided many possibilities and opportunities for 
providing faster and quicker access to information at the global 
level through communication networks, hitherto not possible, 
thereby making the whole world a global village. 


The conventional books as carriers of information have 
some limitations, viz., difficult to reproduce, expensive to 
disseminate, difficult to update, single copies cannot be 
shared or can be damaged and vandalized easily, bulky to 


244 Manual of Digital Libraries 


transport, embedded material is unreactive and static, cannot 
utilize sound, cannot utilize animation or moving pictures, 
unable to monitor reader’s activity, cannot assess reader’s 
understanding, and unable to adapt material dynamically. 


Electronic publications cover the rapidly increasing area 
of publications that require a computer to be used to access 
the information that they contain. There can be documents 
distributed free of charge or obtained by purchase. They are 
supplied in two forms—Off-line publications and Online 
publications. Some electronic publications are not supplied 
on physical carriers rather they need to be copied into the 
libraries’ access system and be stored on hard disc stacks, 
tape streamers or other data storage systems while others 
are supplied on physical carriers and can be stored on 
shelves. 


Off-line Publications : An off-line publication is an 
electronic document which is bibliographically identifiable, and 
which is stored in machine readable form on an electronic 
storage medium. CD-ROM, diskettes or floppy discs and 
magnetic tapes are examples. 


— Off-line monograph, e.g., a CD-ROM encyclopaedia. 
— Off-line serial, e.g., a CD-ROM journal. 


On-line Publications : An on-line publication is an 
electronic document which is bibliographicall identifiable, 
which is stored in machine readable form on an electronic 
storage medium and which is available on-line. For example— 
an electronic journal, a World Wide Web page or an on-line 
database. 


— On-line monograph, e.g., a dictionary on the Web. 
— On-line serial, e.g., an electronic journal on the Web 


— On-line resource, e.g., an organization’s home page. 


Documents and Resources in Digital Libraries 245 
4.1. OFF-LINE PUBLICATIONS 


Electronic publications can be originally electronic 
publications, but they can also be the digitized version of a 
written or printed document. For many collections, most of 
the electronic publications will be the digitized version of a 
written or printed document in their possession. 


Electronic documents differ radically from printed ones, 
which are typically distinct entities of a static, unchanging 
nature. Initially, electronic documents were also just imitations 
of printed documents in electronic form but now they have 
become more and more compound and dynamic. Compound 
means that the document can consist of various distributed 
“information objects,” discrete components, each of which 
can be at a different physical location. It can also include 
metadata, e.g. author, revision history, etc., and links to 
external multimedia modules like images, video, audio, 
graphics, etc. 


An electronic document can be nothing more than a 
set of pointers to a number of components. Dynamic means 
that any component of the document can change, giving the 
document a temporary nature. It is obvious that the up-to- 
date nature of a networked information resource is its strength 
but also its weakness. It is exactly what someone needs ina 
rapidly changing world and at the same time is something 
that cannot be used as a reliable and constant citation, since 
the reliability of a resource can be assured only by its 
maintainer. 


Electronic documents are also available on the Internet 
(commonly indicated as Networked Information Resources 
or NIRs) or online documents. Although access to most NIRs 
is currently uncontrolled and free, controlled access (e.g., 
through user registration and passwords), and fee-based 
access (based on networked payment mechanisms) are 
options, which allow the addition of commercial and 


246 Manual of Digital Libraries 


copyrighted materials to the range of networked information 
resources. 


The critical factor in the effective use of the resources 
is the foundation of a common ground of standards, which is 
absolutely necessary for the improvement of their 
interoperability. The concept of interoperability includes wide 
usefulness (re-usefulness), portability (across networks, 
systems, and organizations) and longevity i.e., portability 
across time. The key to the interoperability of content is 
consistency, and consistency is achieved through the use of 
standards. 


Firstly some lights on electronic document selection 
criteria. Libraries today buy licenses for an ever-Increasing 
number of information resources from a range of different 
publishers and providers, and use a diverse set of 
technologies for information delivery. In addition, a wealth of 
relevant resources are freely available on the Web for libraries 
to incorporate into their e-collections and to make them readily 
available to their users. Materials may be in print and/or 
electronic form; formally and/or informally published; and 
stored locally, for access via an institution’s intranet, or 
remotely and accessible via the Internet. A number of services 
are outside the library’s control, but nonetheless libraries want 
to integrate their resources, presenting the information from 
any particular source within the context of the complete 
collection. Searching across repositories is only part of the 
solution. While not all subscriptions lend themselves to 
electronic delivery, electronic subscriptions offer a great 
potential for increased value to the entire organization. The 
move from atoms to bits complicates the jobs of information 
professionals, but the benefits - competitive advantage, 
access to informaticn by a wide spectrum of the users, can 
be tremendous. 


Like print materials, digital titles added to the collection 


Documents and Resources in Digital Libraries 247 


need to match the needs of the clientele assuring appropriate 
scope, content, depth, and quality. The materials selected 
should be affordable, the content must be timely, 
bibliographically accessible and in the appropriate language, 
etc. It is also presumed that there are no technical reasons 
why the library cannot provide access, e.g., does not use a 
proprietary browser, permits printing, etc., and that their use 
by library user and librarians will not require an inordinate 
amount of training. 


Similarly hyper-linking assumes major importance. The 
simplicity and ubiquity of the Web is such that users now have 
high expectations of their own digital library environments in 
terms of case of navigation between the many and varied 
electronic information resources. They also expect minimum 
efforts for their information-seeking activities. It is the librarian, 
as information intermediary, who is in a better position to 
determine how resources should be interlinked to provide for 
ease of navigation, i.e., what should be linked to what, and 
how it should be linked. The librarian has traditionally applied 
a broader range of knowledge to packages of information 
and is now challenged in extending their skills to highly diverse 
and widely dispersed users wherever and whenever they 
need it. The Web is undoubtedly facilitating linking among 
digital items - and we see many excellent examples of this in 
current offerings from information providers - secondary and 
primary publishers, and from the aggregators. 


Now, have a look on important electronic documents. 
Broadly, off-line electronic documents may fall under following 
categories : 


— CD-ROMs 
— DVDs 
— Electronic Journals and 


— Electronic Books. 


248 Manual of Digital Libraries 
4.1.1. CD-ROMs 


The first product in evolutionary optical storage of 
information is CD-ROM. It is a permanent optical-based 
storage device, that in conjunction with an associated drive 
becomes a powerful peripheral for the PC. T he CD-ROM 
puts multimegabyte permanently stored data bases at user 
front end of a PC. Its only drawback is that the end user cannot 
put his own information on it. The advantages of storing data 
on CD are: 


— high storage capacity upto 700 MB; 

— nohead crash; 

—  lowerror rate, correctable data; 

— longlife; 

— random access; 

— early transportable; 

— operating system independent file system; 
— lowcost; 

— reliable medium; and 

— less shelf space. 


CD-ROMs are the chief e-resource of a library, which 
are used in the storage of a large amount of data with user- 
friendly search software. It can be networked through a CD- 
server or exist as a stand alone units with both specific or 
general in coverage. There is a developing trend to use CD- 
ROMs for specialized collections of full text materials. 


The first disc in the family to be developed was the 30cm 
analogue LV (Laser Vision) Disc for video. This usually 
consisted of two discs stuck back-to-back to form a double 
sided disc with one hour of video per side. A sub-format was 
developed which could store up to 54000 still video images 


Documents and Resources in Digital Libraries 249 


per side. The LV disc was the most successful of several 
attempts to generate market acceptance but soon was 
superseded by the DVD (Digital Versatile Disc or Digital Video 
Disc) that was launched in 1997. The DVD is of the same 
diameter as the CD (12cm) but, by using a laser with a shorter 
wave length, the storage capacity of one layer is increased 
by a factor of seven to 4 x 7 GB. 


Additionally, a dual layer structure will be possible, read 
by two different laser wave lengths, thus doubling the capacity 
to 9 GB. In principle, by glueing two such double layer disks 
together like the LV video disks, a total capacity of 18 GB can 
be achieved. The disk is intended for the storage of data- 
reduced video-films or, like CD-ROMs, texts and multimedia 
data with, however, considerably higher storage capacities. 


Write-Once Recordable Media : There are several 
types of write-once recordable disks. The format that is 
becoming the most widely used is the recordable CD (CD-R 
or CD-WO) which has been available since 1993. Having the 
same format and storage capacity as the audio CD and the 
CD-ROM, the CD-R can be played on the appropriate 
standard CD drives. The polycarbonate body of the disk has 
a dye layer placed on it which is then coated with a metallic 
reflective layer. The dye layer carries the data in place of the 
pits of pressed discs. 


When recording, high-intensity laser pulses change the 
dye from opaque to transparent. The low-intensity read laser 
reads the changes in reflected light as a digital bit stream. 
Once written, the data cannot be altered. CD writing drives 
are already available on different speed levels. The CD-R is 
a well established and standardized format. Different 
standardized software protocols are available for recording 
audio CDS and CD-ROMs. The Photo-CD is a CD-R with a 
proprietary software protocol to record photographs as 
electronic still images. 


250 Manual of Digital Libraries 


CD-Rs are but the latest and most prominent examples 
of so-called WORM (Write Once, Read Many) disks which 
have been in use as computer storage media for quite some 
time. The biggest problem with WORMs is the great variety 
of systems and formats. A number of producers offer WORMs 
with a continuous helical recording format similar to a sound 
LP disk; others offer disks with ring-shaped tracks as on 
computer floppy and hard disks. Some can use both formats. 
The proprietary software of WORMs poses a problem, too. 
Not even the physical dimensions are standardized. One 
writing method used by a number of manufacturers including 
LMS, Toshiba and Sony burns pits in the metallic surface of 
the disc with a laser beam. Another system supported by ATG 
and Optimen creates bubbles by the heat of the laser beam. 
In both cases the reflectance of the metallic layer is changed 
and the data can be read by a low power laser beam. 


Re-writable Optical Media : In contrast to the preceding 
optical media, data on rewritable optical disks (Erasable)/ 
Magneto-Optical (M/O) and Phase-change, can be altered 
or deleted many times. There are re-writable optical disks in 
the 5x25 inch format and, more recently, in the 3x5 inch 
format. The most common still are the magneto-optical discs, 
where a laser beam in the write mode heats the inner layer of 
the optical disk and thus changes the polarity of a magnetic 
coating. The resulting microscopic magnetic marks of different 
polarity can be read as a bit stream by a low-energy laser 
beam in the read mode. 


A more recent recording technology is the Phase- 
change where the carrier layer is coated with a thin semi- 
metal film, which can be both in an amorphous and in a 
crystalline state. A laser beam in the write mode can change 
single spots to either an amorphous or a crystalline state so 
that, again, a digital bit stream is created. The Phase-change 
may replace M/O in the future. Re-writable optical disks have 


Documents and Resources in Digital Libraries 251 


a short access-time of 600 milliseconds. The storage capacity 
has steadily increased up to the current 2 x 6 GB.. 


Optical Tape : Optical tape is packaged in a cassette 
for use as a WORM format data storage tape. The tape drives 
are made by EMASS in the USA and supplied in Europe by 
GRAU Storage Systems. Kodak has also launch a competing 
system. The tape contains a dye layer which changes its state 
when a high power laser beam is applied and can be read by 
a lower power laser - the same basic method as for CD-Rs. 
Because the tape is a Sequential carrier, the access time can 
be quite long. In compensation, the storage capacity of one 
tape is considerably greater than a disc, up to 100GB. 


The advantages that made CD-ROM practically 
attractive in today’s competing information environment are: 


° Multi-Media : The ability to deliver unlimited end users 
access to over 600 MB information using ordinary post 
means. It can be used for text, graphics, data, audio 
and video in one simple purchase. 


° Multiplatform : It is the first truly system independent 
media, which allows equal access to information 
regardless of the computer hardware or software 
platforms used by the end users. 


e Multilingual : Sophisticated CD-ROM systems today 
allow the end users to select the operating language at 
choice and change it on-the-fly. 


The technology which has most characterized the 
nineties is multimedia. Multimedia technology integrates text, 
images, graphics, video, animation, sound or music. It is 
interactive on the lines of hypertext and hypermedia. The 
multimedia has a number of technological barriers, for 
example, extra disk space required to store multimedia 
formats and broad bandwidth to transmit the text. 


252 Manual of Digital Libraries 


Various Encyclopaedias like— Encyclopaedic Britannica 
and Dictionaries like Webster Dictionary etc. are available 
on CD-ROMs. The other examples include — 


(i) Cross-Cultural CD, published by sliver platter 
information, UK contains full text articles on Human 
Relation subjects. 


(ii) Current contents on CD-ROM- Arts and Humanities, 
published by Institute of Scientific Information, USA 
contains number of journal articles on Arts and 
Humanities in a database form. 


(iii) Current contents on CD-ROM- Social and Behaviour 
Science, published by Institute for Scientific Information, 
USA contains database of journal articles from more 
than 1600 journals on Social and Behaviour Science. 


(iv) AGRIS International, published by Dialog contains 
database of journals on Agriculture. 


(v) CAB abstract, published by Dialog, contain abstract 
article from more than 50 journals. 


(vi) Biotechnology Abstract on CD-ROM, published by 
Silver Platter Information Ltd, UK contains abstract 
articles from good number of journals on Biotechnology. 


4.1.2. Digital Video Disc or Digital Versatile Disc (DVD) 


The first read only optical disc to become available 
commercially was the 12 inch optical video disc in 1978, 
followed by the 8 inch optical video disc in 1982 and the 4.72 
inch CD-ROM in 1985. When Compact Disc (CD) appeared 
on the optical scene, many felt that the ultimate in data storage 
had been achieved, but it was not to be so. With technology 
moving at an extremely rapid pace, the next generation of 
optical disc technology was made available in the form of 
Digital Video Disc or Digital versatile Disc (DVD). The first 
DVD players appeared in Japan in November 1996, followed 


Documents and Resources in Digital Libraries 253 


by the US players in March 1997. Currently, over a hundred 
models of DVD players are available from dozens of electronic 
companies, thus, giving this technology a scope to improve 
further. 


The advent of DVD with its 17 GB of high data storage 
capacity, has made it possible to include more multimedia 
elements like, video, sound, and to integrate several reference 
sources on a single disc. DVD is the next generation to 
Compact Disc (CD) in optical disc storage technology. A DVD 
looks just like a CD (both 120mm in diameter) but, has higher 
data storage capacity. Like a CD, data are recorded on DVD 
in a Spiral trail of tiny pits, and the discs are read using a laser 
beam. The DVD's larger capacity is achieved by making the 
pits smaller and the spiral tighter, and by recording the data 
in as many as four layers, two on each side of the disc. To 
read these tightly packed discs, laser that produce a shorter 
wave length beam of light are required, with more accurate 
aiming and focusing mechanisms. In fact, the focusing 
mechanism is the technology that allows data to be recorded 
on two layers. To read the second layer, the reader simply 
focuses the laser a little deeper into the disc, where the second 
layer of data is recorded. However, since a 135-minute movie 
fits on a single DVD layer, single layer DVDs will be the most 
common. The DVD technology provides a storage capacity 
that is at least 6-7 times greater than that of a CD, in the 
same areal space. A greater than two hour-long movie can 
be stored on a DVD disc, with very high quality video. There 
is an equivalent increase in audio quality too. The main feature 
of DVD is the compression technology and storing data on 
multi-layer sides. A comparative study of the basic 
characteristics of CD and DVD are represented in Table-4.1. 


4.1.2.1. Types of DVD Formats 


DVD consists of five formats, which are represented 
below: 


254 Manual of Digital Libraries 
Table 4.1 : CD/DVD Comparison 


Characteristics DVD CD-ROM 

Disc Diameter 120 mm 120 mm 

Disc Thickness 1.2mm 1.2 mm 

Sides 1or2 1 

Layers per side 1or2 1 

Capacity (GB) 4.7, 8.54. 9.4, or 17 o7 

Track pitch (microns) 0.74 1.6 

Min pit length (microns) 0.4-0.44 0.83 

Linear velocity used for 3.5-3.84 1S 

scan (m/s) 

Laser wavelength (nm) 635 or 650 780 

Numerical aperture 0.6 0.45 

User data rate 1.4 MB/Sec 10.0 MB/Sec 
(DVD-Video) (Video-CD) 

Durability and dust/scratch Same as that of CD High 


DVD-ROM : DVD-ROM, is a high capacity, read-only 
optical disc format that can be used as a general purpose 
computer storage device. The DVD-ROM format book does 
not discuss the application programs or content that may be 
published in the DVD-ROM format. This means that the DVD- 
ROM format can be used for a wide variety of purposes within 
a personal computer environment. 


DVD — Video : DVD-Video is a read only optical disc 
format that can be used for the interactive playback of high 
quality video, audio and graphic content. DVD-Video has been 
designed to provide a significantly higher level of video and 
audio quality for playback of full-length motion pictures as 
well as interactive games. The major technical features of 
the DVD-Video format include: 


Documents and Resources in Digital Libraries 255 
— General purpose digital volume and file structure; 
— High quality multimedia format; 
— Built-in navigation and search functions; 
— Regional coding; 
— Parent lock-out; 
— Copyright protection; and 
— DVD-Video adhering. 


DVD — Audio : DVD-Audio is a read- only optical disc 
format that can be used for the playback of high quality audio 
content. The DVD Audio standards are developed by the DVD 
Consortium members and other interested parties. The goal 
of the standard was to provide a consumer format that may 
provide better audio quality than the current Audio-CD Format, 
with backwards compatibility so that Audio-CD titled can be 
used on a DVD-Audio player. 


DVD-R : DVD-R, is a write-once/read-many (WORM) 
optical-disc format that can be used as a general purpose 
computer storage device. The DVD-R standards have been 
approved and has been published as Revision 1.0. DVD-R 
devices have a storage capacity of approximately 3.95 GB 
per side. 


DVD-RAM : DVD-RAM is a read/write optical-disc 
format that can be used as a general-purpose computer 
storage device. DVD-RAM devices have a storage capacity 
of approximately 2.6 GB per side. 


4.1.2.2. DVD Format Compatibility 


As we know it, not all DVD recorders are compatible 
with each other and some of them have compatibility issues 
with the home DVD player. The following table 4.2 compares 
DVD formats from the compatibility viewpoint and summarizes 
some other features. 


256 


Manual of Digital Libraries 


Table 4.2 : DVD Format Compatibility 


Format 


DVD-RAM 


DVD-R 


DVD-RW 


DVD+R 


DVD+RW 


Advantages 


Rewritable 

Longer durability 

Well tested technology 
Improving 


Cheaper discs than 
rewritable media 

High compatibility with 
older DVD-ROM drives 
and players 


Rewritable 

Compatible with many 
current DVD drives and 
players 


Has potential for greater 
compatibility with older 
drives and players than 
RW 

Cheaper than 

rewritable discs 


Rewritable 

Compatible with most 
current DVD drives and 
players 

Fast 


Disadvantages 


1 


Slow when used for data 
storage 

Still compatible with 
most DVD drives and 
players 

Relatively expensive 
disks 


Not rewritable (like CD- 
R) 


Often compatible with 
older drives and players 


Not rewritable (like CD- 
R) 
Has compatibility issues 


Largely incompatible 
with older DVD drives 
and players 


4.1.2.3. Care and Handling of CDs and DVDs 


CDs and DVDs can be reliable for many decades with 
proper handling. As with all other types of media, degradation 
is inevitable over time, but steps can be taken to help to 
prevent them from occurring prematurely. The effects of 
environmental conditions and physical handling on optical 
discs are presented below: 


Documents and Resources in Digital Libraries 257 


Environmental Conditions : Optical discs may perform 
well within a wide range of temperature and relative humidity 
conditions. Discs kept in a cooler, less-humid environment 
and not subjected to extreme environmental changes should 
last longer. If stored at a very low temperature relative to the 
user environment, the disc should be gradually acclimated to 
the environment in which it will be used to reduce stress and 
moisture condensation. A significant, abrupt temperature 
change will cause greater stress than a gradual change. 
Leaving the disc in its packaging will allow gradual acclimation 
to a changed environment. Discs used frequently should be 
stored at a temperature similar to that of the environment in 
which they are to be used. This minimizes stress from frequent 
temperature changes. 


Effect of Light on ROM Discs: Although the effect of 
light on ROM or read only discs over time is not known, the 
effects of long- term exposure to light (e.g., UV, infrared, 
fluorescent) under ambient intensity, such as room lighting 
are generally thought to be so minimal that light is not 
considered a factor in the lifetime of the ROM disc. Any effect 
of light on the disc would involve degradation of the 
polycarbonate substrate and would become noticeable only 
after several decades of exposure to daily storage facility 
lighting or sunlight through windows. Light effects on ROM 
discs, therefore, are considered negligible. 


Effect of Light on Writable Discs: Prolonged exposure 
to sunlight or other sources of UV light can significantly 
increase the degradation rate of the dye layer in writable discs. 
Deterioration of the dye makes it less transparent. As a result, 
some, or all, of the unmarked areas in the dye could be read 
as marks, depending on the severity of degradation. These 
areas will then result in errors when read by the laser. 


The most likely cause of damage to these discs from 
direct sunlight is by heat buildup in the disc affecting the dye. 


258 Manual of Digital Libraries 


Much of the ultra-violet range of sunlight can be filtered or 
absorbed, by glass-e.g., the glass of a window. However, 
the lower light frequency of infrared range will pass through a 
window and generate heat in the disc. A disc in a case, or 
one with a dark label, printing, or colour that allows it to absorb 
more sunlight, also makes a disc more prone to heat buildup 
from direct sunlight exposure. The effects of heat buildup can 
be minimized if the disc is kept cool, such as in air-conditioned 
room. 


Effect of Light on CD-RW and DVD-RW, DVD+RW, 
and DVD-RAM Discs : Light should have minimal, if any, 
effect on RW and RAM discs, for the phase changing film 
used in such discs is not light sensitive. This film, however, is 
affected by heat; in fact, it is the heat that is generated from 
the intense laser beam that writes data in the phase-changing 
film. Heat buildup in RW or RAM discs caused by direct 
sunlight will accelerate the degradation rate of the phase- 
changing film just as it does that of the dye in writable discs. 
The phase-changing film in RW and RAM discs degrades 
naturally, and from heat buildup by direct sunlight, at a faster 
rate than the dye in R discs. 


Moisture : The polycarbonate substrate, or the plastic 
composition, that makes up most of the disc is a polymer 
material that is vulnerable to moisture. Any prolonged 
exposure to moisture resulting from a spill, humid air, or 
immersion allows water to become absorbed into the disc, 
where it may react with any of the layers. Returning the disc 
to a dry environment will allow the absorbed moisture or water 
to dissipate out of the disc over time; however, water or a 
water-based liquid may leave behind, within the disc, 
contaminants such as dyes or other dissolved minerals. If 
the disc has experienced no permanent damage from 
absorption of the liquid, it should play normally. 


Surface - Handling Effects: Anything on an optical disc 


Documents and Resources in Digital Libraries 259 


surface that impedes the ability of the laser to focus on the 
data layer van result in missing data as the disc is being read. 
Some surface -handling effects are discussed below: 


Scratches generally cross data lines on the disc, and 
how bad (deep and wide) they will determine the extent of 
interference with laser focus on the data. Small or occasional 
scratches will likely have little or no effect on the ability of the 
laser to read the disc, because the data are far enough below 
the surface of the disc that the laser is focused beyond the 
scratch. This is comparable to the effect of a light scratch on 
a pair of eyeglasses, which does not markedly impair vision 
because the viewer's eyes are focused beyond it. 


Scratches on the label side of single-sided DVDs are 
not likely to pose a problem. The metal layer so prone to 
damage in CDs is in the middle of DVDs. Its location makes 
this layer almost impervious to surface scratches. It is in fact 
unlikely to be affected by any but the deepest scratches-those 
deep enough to reach the center of the disc where the metal 
and data lie. 


Fingerprints, smudges, dirt or dust on the laser reading 
side of the disc can disrupt laser focus on the data even more 
thana scratch can. Dirt or dust on the disc will block or reduce 
the light intensity of the laser. If severe enough, it will cause 
the disc drive to miss data as the disc is being read. 
Fingerprints, smudges, or dirt cover wide areas of data and 
will cause the laser beam to go out of focus or loose intensity. 
They will also cause widespread misreading of data along 
the data lines or tracks, to an extent that exceeds the error 
correction capability of the disc drive. Dust can also spin off 
into the disc drive and collect on the laser head or other 
internal components. However fingerprints, smudges, and dirt 
are easier to remove than scratches. 


Marking : Marking and labeling a CD or DVD is an 
essential process in its creation. CDs and DVDs, or their 


260 Manual of Digital Libraries 


containers, are labeled in some form or fashion so that they 
can be identified and organized. But when labeling a CD with 
markers, the composition of the ink in the marker and the 
style or design of the marker should be soft. 


Cleaning : If the disc needs cleaning, the following 
points should be remembered: 


— Use an air puffer to blow off dust. 
— Use a soft cotton cloth or chamois to wipe the disc. 


— Try cleaning with a dry cloth first, before using any 
cleaning solutions. 


— Do not wipe ina direction going around the disc. 


— Wipe from the center of the disc straight towards the 
outer edge. 


— Avoid using paper products, including lens paper, to 
wipe the disc. 


— Avoid using anything abrasive on the surface of the disc. 


— Ifthe disc has a heavy accumulation of dirt, try rinsing 
it with water first. 


— Use commercially available water-based detergent 
formulated for cleaning the surface of optical discs. 


— Use isopropyl alcohol or methanol, as an alternate to 
water-based detergents, to clean the disc surface. 


Optimum environmental conditions for the storage of 
digital media are well documented. Table 4.3 summarizes 
the recommendations for magnetic tape, CD-ROM, and 
DVDs. 


4.1.2.4. Library Applications 


The application of DVD technology in libraries include 
the following: 


261 


ee sss se ee eee ._ Oo aneeeoeree_ 


eo nnn eee el 


epa |eYBIG wog 10} SUONIPUOD eBeJO}S jejuawuosau] ‘Ey ‘Bld 


0S 0} 02 oz oy GAG } WOY-d9 
Gy 0} GE ZZ 0} 8L 08 00L 0S 91 04 WOuw-d9d 
WO? UONWeWeA y UOI}ELeA 
%0} UONeWeA p UONeUeA wnwxew wnuwixew 
% wnwixew wnwixew ‘(SZ-Sz) (ez 01 SL) 
R ‘% OZ Se MO| SY G se MoJ Sy yuəiqwe WooY juaiquie WooY ade} snoubeyy 
S i ugos jeaijay wwg 
3 09 01 02 ZE 01S 08 0} 02 Sros 9 y ade} onoubeyy 
8 Gy OSE Ce 0} 81 08 91 02 Sy 004 saBpuyeo ede} onəuĵew 
S 
P Gy 0} S€ 72 1 BL GS 0} GP 72018, wwz'z, seyjesseo ade} snoubew 
Q 
O 
S (%) AupiwunyH (3) (%) ApiunH (9.) 
g ƏAJeJIH əmesədwəa L aAelay ainjesodwa | 
SS 
= (ajqissod se Buoj se (yoegAe|d pue ssa0e 
X eipəw əy} səməsəıd) əeıpəwwı SMO|]e) 
2 aBe10}s wi9}-fu07 aBelojls ssa00y eipəa Ww 
7) 
& 
Q 


262 


Manual of Digital Libraries 
Large full-text database, 
Mixed databases, 
Quick reference databases, 


Very high quality multimedia that help in better reference 
service, 


Archival purposes, 


Cost-effective mass storage for networking purposes, 
and 


Preservation media for multimedia libraries etc. 


4.1.2.5. Some DVD Reference Sources 


With the refinement of optical technology tailored to the 


needs of libraries and information centers, a large number of 
reference sources have started appearing in DVD form. A 
few such sources which deserve mention are as follows: 


First databases to be made available on DVD was Union 
Catalogue of Belgian Research Libraries by Silver 
Platter (1996 November), 


Silver Platter’s MEDLINE advanced, was the first 
bibliographic database, over 8 million citations and 
abstracts of articles from 3700 journals from 70 
countries. 


EBSCO Publishing now offering Business Source Elite 
in DVD-ROM, which provides access to a rich collection 
of popular business magazines, scholarly journals and 
trade publications with coverage from 1984. 


Some multimedia encyclopedias are now also available 


in DVD form such as : 


Britanica DVD with over 73,000 articles etc. 
Webster's International DVD Encyclopedia (published 


Documents and Resources in Digital Libraries 263 


by Multimedia 2000) with 10 million words, 50,000 
entries, 12,000 photos, 67 videos and animations etc. 


Funk & Wagnall’s Multimedia Encyclopaedia with over 
75,000 multimedia elements. 


Grollier Multimedia Encyclopaedia, and 
The Complete National Geographic on DVD-ROM. 


The above stated DVD Reference Sources are very 


useful in various branches of knowledge and satisfy the needs 
of users to a great extent. 


4.1.2.6. Merits of DVD-ROMs 


Some of the relative merits of DVD technology that 


deserve mention are as under: 


High data storage capacity of 17 GB. 

DVD can deliver the data at a higher rate of 1.38 MB/s. 
DVD drives can read both CD-ROMs and DVD-ROMs. 
No telecommunication facility is essential. 


Printing and downloading the information is feasible, 
and 


Multi-lingual databases can be handled more 
effectively. 


4.1.2.7. Pitfalls of DVD-ROMs 


In spite of its numerous advantages, the DVD 


technology is not free from perils. Thus, this hi-end technology 
suffers from the following pitfalls: 


Extra hardware in PCs are required. 
Higher price compared to CD-ROM drives, and 


Number of available reference sources in DVD is very 


264 , Manual of Digital Libraries 
less compared to CD-ROM versions. 


But there are a lot of features, which make DVD popular. 
DVDs are compact in size, making them easy to handle. Since 
DVDs are read by a laser, they are more resistant to a point 
of fingerprints, dust smudges and scratches. Most scratches 
will cause minor channel data errors that are easily corrected. 
That is, data are stored on DVDs using powerful error 
correction techniques that can recover from scratches as big 
as 6 milimeters with no loss of data. The role of this latest 
technology will notice an increase in the awareness of DVD 
among libraries and an increase in the number of DVD 
reference titles in the days ahead. Although the database 
industry itself is moving in the direction of web-based 
resources, or linked sources, but DVD holds tremendous 
promise, itis merely one of the competing and viable formats 
and its place in libraries is assured. Features like, higher 
quality of sound and video, higher rate of data transfer, data 
security, etc. make DVD more viable option than CD-ROM. 
However, problems like, lake of standards among the 
manufacturers of DVDs and drives, need extra hardware on 
PCs coupled with their higher price make the growth of DVD 
technology slow. 


4.1.3. E-Journals 


The use of computer for information storage and 
retrieval activities began in early 1960s in an offline, batch 
processing, tape-oriented mode. Vast amount of bibliographic 
data for printing of indexing and abstracting services were 
computer processed and then printed. Gradually, computers 
were increasingly used for photo typesetting and for other 
operations relating to publishing. Computerized typesetting 
and page layout software are now commonplace. Journal 
articles are now submitted on disk (and online) publishers 
apply their skills in quality management for presentation and 
layout of material already in machine-readable form. The 


Documents and Resources in Digital Libraries 265 


process creates data files from which output can be generated 
for other media. As such, full-text online journals were 
available through online hosts like DIALOG and STN for past 
several years. 


However, publication of e-journals, as we know them 
now, is a comparatively new phenomenon. With advent of 
CD — ROM technology as optical storage media in the mid- 
1980s, several electronic journals started appearing on CD — 
ROM. The first major development in this direction were 
projects experimenting with electronic equivalents of printed 
journals. One of the oldest examples is ADONIS where 
images of articles published in printed journals are distributed 
on CD—ROM. Still older examples are full-text online journals 
offered by the major host organizations. Online hosts like 
DIALOG and STN were not only offering online databases 
but also full-text online journals for past several years, 
although as a simple ASCII or text files without graphics and 
pictures. In 1989, there were almost 1,700 full-text sources 
available through sixteen online systems. All of these projects 
involved journals and all of them were by definition electronic, 
but these journals were not truly electronic, and can at best 
be described as electronic versions of printed journals. Recent 
growth in electronic journals can be attributed to the three 
important factors, i.e. advanced technology, its availability at 
low price and convenience it offers. Technological changes, 
especially the Internet and web technology, continue to attract 
more and more traditional players to adopt it. There has been 
a steady move up the technological scale from the early low- 
end electronic publications available as ASCII files, to being 
organized and searchable on gophers, to being tagged and 
graphically viewable on World Wide Web sites. 


Electronic magazines or e-zines, as we know them now, 
come in many forms and flavour. Some of them are traditional 
paper journals simply made available electronically; others 
are sample selections, or just the table of contents of the 


266 Manual of Digital Libraries 


paper journals, still others have no equivalent paper copies. 
As of now electronic journals are the digital equivalent of their 
print counterparts. Whereas databases may contain only 
some full text or links to full text, electronic journals are literally 
the entire journal. 


The advent of electronic full text journals affords the 
opportunity to take a fresh approach, recognizing that any 
risk to publishers in new electronic age is likely to full on small 
and medium size libraries, which are operating on restricted 
budget. Many e-journals are now available online. The 
libraries can subscribe to electronic journals from publishers 
or through a second party (e.g. vendors) just like they 
subscribe to print journals. This distinction between publishers 
and second parties is an important one, as these are the two 
ways libraries get electronic journals. Some companies create 
collections of entire journals and sell access to these 
collections. Second party electronic journal databases are 
different from other databases because in a second party 
electronic journal database, the entire journal is collected, as 
opposed to a full text database like “Academic Search Elite”, 
where all, some, or none of any given journal may be included. 
Some publishers provide free online access to journals 
published by them against print subscription of library. 
Publisher provide access to these e-journals either through 
homepage, for example Cambridge University Press, Chicago 
University Press etc. or through aggregator such as Analytic 
Press, Blackwell, Sage publication, Springer etc. E-journal 
access can be made through:- 


e Individual subscription : In this type, the library act as 
a source of information on what is available. It requires 
the users to have access to internet but there are no 
storage problems. 


e Local storage at Institutional level : In this type, the 
journals are stored once and the searching mechanisms 


Documents and Resources in Digital Libraries 267 


are used to access them which are controlled by the 
institutions. 


Commercial e-journals providers : There are many 
commercial providers or aggregators which ensures 
high standards in delivery and presentation. To use this 
service, it may require special equipments and/or 
software. 


Cooperative projects at a national or international 
level: In this type, a group of institutions/organizations 
together negotiates with the publisher. For example 
CAUL (Council of Australian University Librarians) has 
negotiated agreements with current contents to allow 
access for all Australian Universities to this service. 


4.1.3. Process of Publication of a Journal 


The process of publication of a journal whether on paper 


or on electronic media has certain well-defined activities which 
include: 


A. 


i. Collection of manuscript from author; 


ii. | Evaluation of contents by process referring /peer 
review; 


iii. Editing the technical contents; 

iv. Improving the language, style of presentation; 

v. Composing, proofreading, page making, 
designing; 

vi. Printing; and 

vii. Binding. 

i. Cost estimation and pricing; 

ii. Publicity and advertising, catalogue, etc.; 


iii. Distribution and marketing; 


268 Manual of Digital Libraries 
iv. Feed back and updating; and 
v. Copyright and other legal aspects. 


The publishers have now re-engineered their print base 
production process to accommodate electronic publications. 
Journal articles are frequently submitted on discs and online 
done in one of the popular word processing software. These 
machine-readable files are pulled into publishing software like 
Frame Maker and Page Maker. The documents are converted 
into SGML, HTML, PostScript and PDF. While HTML and 
PDF is posted on the web. The SGML, version, which is a 
rich archive format, is used for preservation. 


Electronic publishing and their access over the net led 
to hopes in some quarters that a less expensive publishing 
model around the electronic publishing and delivery might 
emerge. Unfortunately none of the electronic publishing 
initiatives to date has resulted in significant savings for 
libraries. Those publishers that offer electronic and print 
subscriptions tend to sell them for a “bundled” price, usually 
to the order of 10 to 30 percent over the price of paper 
subscription alone. Only a small number of publishers are 
offering electronic journals as separate, stand-alone 
subscription, which may be same as the print product or 
slightly higher in the name of added value to the e-version 
offers. 


One of the major issues that the publishers are 
concerned with is to save their economic interest in the 
process of providing electronic access to their printed 
publications. The publishers make a significant investment 
in the process of production of a journal, which involves 
activities like peer-review, administration, editing, layout 
design, production, subscription management and 
distribution. Most activities that are performed for publishing 
a journal are common to both electronic and paper media, 
except for production and distribution where the cost involved 


Documents and Resources in Digital Libraries 269 


is relatively low. Tenopir and King in a study concluded that 
the costs of electronic journals can not be substantially lower 
than their printed versions. 


4.7.3.2. Features of Electronic Journals 


An electronic journal need not simply be an electronic 
mimic of a paper journal. The available technology can provide 
dynamism to an electronic publication hitherto impossible in 
print publication. Some of the features that electronic journals 
can provide using the available technology include: 


— Linking citations and references to bibliographic 
databases or to full-text articles, where possible; 


— Links to graphics / photographs, video or audio clippings 
not included in the paper; 


— Links to corrections or to later article that cite the paper; 


— Access to more detailed data or to multimedia 
information provided by the author; 


— Links to external databases; 


— Links to reader’s comments or discussion forums 
related to the paper; 


— “Dual Publishing” in more than one electronic journal, 
for example, a Chemistry article of interest to biologists 
could appear both in a chemistry and a biology journal; 


— A “living article” where the user could log in at any time 
and see an experiment on an ongoing basis showing 
data collected that day; and 


— Embedded software programs allowing users to mirror 
the authors’ work by manipulating data or running 
simulations based on their own inputs. 


4.1.3.3. E-journals to E-articles 


The publishing is already shifting from a focus on issue 


270 Manual of Digital Libraries 


of a journal as the fundamental unit to a focus on an individual 
article. Favouring early release of scientific contents, major 
scientific publishers discussed the issue of releasing 
published material in an article -by-article basis, not bound to 
any specific weekly or monthly issue. American Chemical 
Society, has already taken a lead in this direction. The 
American Chemical Society posts individual journal articles 
as soon as they make it through the review, editing, and author 
proofing process, a format ACS calls “As soon as Publishable 
(ASPP)”. The production process has been modified so that 
instes of thinking of journals as batches of articles that get 
bundled up and put into specific issues that come out once a 
week or once a month, the ACS now treat each article as it is 
finished. 


4.1.3.5. Printed Versus Electronic Journals 


The web technology provides dynamism to the 
electronic documents that were not possible in essentially 
sequential style of presentation of printed documents. 
Interactive hyperlinks to related resources, links to full range 
of multimedia, links to traditional indexing and abstracting 
services, etc. are some of the novelties that are common place 
in a web document and were not possible in a traditional 
printed document. Web publishers or E-journal publishers 
claim heavy investments. Publishers create not only bit-map 
page image but also HTML and PDF formats to provide added 
advantages for their electronic journals. Publishers have to 
invest on new staff, training in computers and networks, 
upgrdation and backup systems. 


There is seen a rapid development in groiwng 
prominence of e-journals. E-journals were started about 15- 
20 years ago when Electronic Journals first appeared the 
publishers did not know how to sell them, nor librarians knew 
how to handle them. The technology scenario prevailing those 
days were entirely different too. Technology kept advancing 


Documents and Resources in Digital Libraries 271 


and today the e-journals are sold as hot cakes in the 
developed world. The trends in the developing world also are 
equally encouraging, especially in India. This fast change in 
mindset has taken many publishers by surprise. Earlier, 
negotiations that involved expanded access to electronic text 
usually began with print subscriptions as the bottom line of 
the deal. During the past couple of years, especially in the 
West, a number of consortia demanded that print and online 
be negotiated separately. They wanted online access at the 
core of pricing negotiations because their users clearly 
preferred online. They wanted an online-plus-print model, 
rather than print-plus-online. This is called “flip pricing”. Few 
publishers adopted that model, however, and most libraries 
have continued to receive print and online more or less as a 
package deal. This has resulted as a real blessing to the Indian 
libraries. The reflections of the flip pricing has started 
crystallizing of late and most of the leading publishers have 
started offering flip price models to Indian libraries too, mostly 
under the consortia banner. 


The evolution of the deeply discounted prices (DDP) 
model is yet another significant development in the journals 
pricing front in the recent past. As DDP has evolved, three 
models have emerged that govern the relationship between 
agents and publishers. First, the publisher requires the 
customer to place electronic orders directly with them; with 
the second version, customers may place their orders with 
agent or publisher, but customers are given an incentive to 
deal direct; and in the third scenario, customers are free to 
order either way. 


75 percent of publishers offered online free with print, 
with the rest charging extra for online in 2000. But every year, 
publishers change their mind on this one issue, and last year 
the pendulum seemed to be swinging back in favor of charging 
extra for online with a print subscription. Of the 7,623 e- 
journals with set prices in EBSCO's reserve, 61 percent were 


272 Manual of Digital Libraries 


offered free with a print subscription in 2001. The majority of 
major publishers still offer online “free” with print as a rule. 
The list includes Cambridge University Press, Elsevier, 
Emerald, Oxford, Routledge, Sage, Springer Verlag, and 
Taylor & Francis. Blackwell Science charges extra, but the 
other Blackwell imprints do not. University presses are split. 
Duke, Indiana, Johns Hopkins, and Penn State, charge; MIT, 
Chicago, and California do not. Society presses are also split, 
but the majority appear to charge for online with print. The 
list of STM publishers that charge extra include giants such 
as Academic, Wiley, Kluwer, Dekker, Nature, Plenum, and 
S. Karger. 


4.1.3.6. Copyright and IPR Issues 


Issues of copyright, intellectual property, and fair use 
are important to libraries, but librarians are struggling to make 
sense of all the issues in the new digital age. We need to 
learn a lot about the “tricks of the trade” so that our hands are 
not burntin the process. The number of accessible electronic 
journals are fast growing steadily. Studies are in progress 
the world over towards determining the trends in copyright 
policies among scholarly electronic journals. There are a 
number of issues which the Librarian should be highly careful 
while entering into digital subscriptions. Catches can come 
in multiple forms with the vague yet meticulously manipulated 
term “licensing”. The audit and accounts are to be amply 
briefed about the problems and features of digital 
subscriptions. Proper measures and back-up agreements are 
to be ensured from the publishers/vendors. 


In the case of free electronic journals, according to 
literature, the policies are found to be quite generous in the 
area of fair use and encourage sharing of information. Some 
of the salient features of copyright issues of a select set of 
freely accessible electronic journals are as follows: (1) 
copyright policies are informal, (2) policies are common to 


Documents and Resources in Digital Libraries 273 


print and electronic titles, (3) contributing authors generally 
retain copyright, (4) noncommercial uses permitted include 
browsing, printing personal copies, and downloading, and (5) 
republishing is not permitted. Several of these journals 
address library use. 


4.1.3.7. Access Related Issues 


Access to E-Journals could be arranged from the 
respective publishers against UserlD/Password or through 
IP authentication. For a wider audience, IP based access is 
mostly preferred, as the users need not have to bother about 
the User ID and Password every time. A useful alternative is 
to provide the User ID/Password in the library’s portal / website 
so that users can pick them up during usage, and over a 
period of time, they also by-heart them. Libraries need to take 
care of the ‘site license’ aspects of content access, as some 
publishers lay stringent rules on this, making the life of the 
Librarian miserable. Before jumping into such services, the 
Librarian should ensure necessary Internet bandwidth also, 
failing which complaints will pile up, resulting in a growing 
aversion against the electronic services in general. For full- 
text access to E-Journals and datasets published online, 
usually non-exclusive, non-transferable right and licenses are 
provided by the publishers. Unlimited access is provided to 
Tables of Contents, article abstracts, chapter summaries and 
websites for other Electronic Products. 


E-journal proliferation has changed how academic 
libraries do business. Since the Second World War, a group 
of vendors has played a broker’s role between library 
acquisition departments and journal publishers. Rather than 
dealing with hundreds of small and large journal publishers, 
libraries achieved economies by using these vendors as 
intermediaries to manage all facets of the exchange, from 
invoicing to replacing lost Internet address, not individual 
passwords. Libraries had their own short-sightedness, failing 


274 Manual of Digital Libraries 


to understand the market needs of publishers and their fears 
of copyright infringement. 


Now, after few years of sometimes contentious 
negotiations, the parties have reached mutual understanding. 
New licensing guidelines spell out the needs and access 
restrictions of scholarly publishers and academic libraries. 
Publishers now understand the unique requirements of 
academic libraries, just as we understand the financial and 
copyright concerns of publishers. Yet another important 
aspect is the local administration of access management by 
the users. Users are to be properly oriented towards effective 
and efficient use of the Net as well as the journal sites. 
Unnecessary and unscrupulous hits, revisits and downloads 
can result in frequent choking of the Net connectivity, posing 
unwanted problems to the entire user community on campus, 
and also to LAN/Net system administration. 


4.1.3.8. Some Issues, Myths and Realities of E-journals 


The following are some unresolved issues in the area 
of electronic journals: 


— How much Full-text is Full-text? 
— Howto maintain Reliability and Accessibility of Data? 


— How to preserve Intellectual Property Right and 
Copyright? 


— How to assure Electronic Archiving and Backfile 
Availability? 


Electronic journals are most talked about aspect of 
Internet. Some of the views widely held about the e-journals 
are far from truth. Woodward et.al, described some of these 
widely held myths and associated unrealities: 


— Electronic journals provide better access to journal 
articles; 


Documents and Resources in Digital Libraries 275 
— Academics and researcher read journals at their work; 
— Readers want electronic journals; 

— Electronic journals are quick and convenient to access; 
— Users know the publishers of journals; 

— Readers want “Page Integrity”; 

— Electronic journals will save money for the library; 


— Storage and dissemination of e-journals inexpensive 
or free; 


— Electronic journals will save paper; 

— Electronic journals will save publishers money; 

— E-journals would make subscription agents redundant; 
— Only recent issues of journals are required; and 


— All scholarly journals will be electronic within a few 
years. 


But inspite of the above myths and unrealities, interest 
is increasing in E-jounals. And the ongoing shift towards 
electronic publishing and access is expected to continue 
inspite of the fact that print medium is still preferred for ease 
of reading and portability and because of the fact that authors 
still consider it the authoritative medium and format for the 
publication of peer reviewed research. The scenario may, 
however, change in time to come depending upon the value 
additions done by the electronic mimics of printed versions. 
The hypertext and hyper media linkages have greater 
applications in the literature of science and technology. The 
electronic publishing still has to achieve economy, authority 
and authenticity in addition to advance of speed and value 
addition it already possesses. The future of e-journals will 
depend on the relative advantages that they could offer to 
those who publish, use or mange it. Journals are in PDF & 


wi- 


au 


276 Manual of Digital Libraries 


Adobe’s PostScript format, with reference in HTML. So they 
are hot-linked. Subscribers and non-subscribers can procure 
table of contents, published abstracts and an advance listing 
of accepted papers scheduled for upcoming issues. Non- 
subscribers should be able to purchase individual articles. 
The information scientists have been toying with the idea of 
replacing the existing print based scholarly communication 
system with a system that revolve around the users rather 
than the authors and publishers. 


4.1.4. Electronic Books 


Books are par excellence the traditional storage 
medium of information and represent a classical tool for 
information delivery. They have a deep historical value and a 
well defined structure; they are easy to read and are 
accessible. In the traditional library, computers are applied 
to back office services. The book collection is still available in 
paper form, which brings a number of disadvantages: 


— Once identified a book of interest may not be available. 


— Itis difficult to retrieve publications of interest, even if 
they are present in the library. 


— A large and costly physical space is necessary for 
storing. 


Most of these problems are overcome in the electronic 
library. The electronic book is a metaphor for handling and 
structuring large volumes of computer-based information, 
consisting of a collection of reactive (screen-based) pages of 
electronic information that are usually organized in a thematic 
way and that exhibit many of the characteristic features and 
properties of a conventional book. 


Generally, three basic types of information are 
embedded within the pages of an electronic books, viz., 
consumer-oriented, control and aesthetic information. The 


——eoeo 


Documents and Resources in Digital Libraries 277 


electronic books may contain different types of information 
such as text, pictures, and sound and all of them, however 
the electronic media are no doubt expensive. On the basis of 
types of information that they contain, the basic properties 
that they exhibit, and the functions that they have to perform, 
electronic books are classified into following types, viz., 
textbooks, picture books, talking books, moving picture books, 
multimedia, polymedia books, hypermedia books, intelligent 
electronic books, telemedia books and cyberbooks. The 
electronic books can be published on a variety of different 
types of distribution media such as, magnetic diskettes, optical 
disk read-only disk, recordable disk and rewritable disk. The 
computer communication networks are another important 
publishing medium for electronic books. 


The Internet has caused a revolution in the book 
publishing industry with the emergence of the electronic book 
(e-book). The advantages of e-books for libraries are 
straightforward and include easy access to content; on- 
demand availability; impossibility of being lost, stolen, or 
damaged; capability of searching within a book and across a 
collection of books; links to other resources, including 
dictionaries and thesauri; no physical space requirements; 
no device needed to access content; content accessible using 
standard web browsers; customizable search interfaces; easy 
transportability; and access from anywhere. 


Opportunities for publishers have also been created 
with the birth of the e-book. E-books have been credited for 
the revival of the scholarly monograph. They also provide an 
opportunity for publishers to maintain a competitive position 
in the publishing and e-commerce markets. The emergence 
of the e-book has given publishers new ways to serve 
customers by repurposing content and creating living books 
that incorporate text, audio, video, and other resources, such 
as dictionaries and thesauri. 


278 j Manual of Digital Libraries 
4.1.4.1. Definition of e-book 


An e-book is based both on emulating the basic 
characteristics of traditional books in an electronic format and 
on leveraging Internet technology to make an e-book easy 
and efficient to use. An e-book can take the form of a single 
monograph or a multivolume set of books in a digital format 
that allows for viewing on various types of monitors, devices, 
and personal computers. The technology should allow 
searching for specific information across a collection of books 
and within a book. E-books also contains text and graphical 
images. The graphical object is always visible and appears 
in the text as in a paper book. The feature of the text is that it 
contains embedded links, i.e., active text that is connected to 
other parts of the book. Hyperbook links can be grouped into 
categories such as: 


Hierarchial links which are defined in the table of 
contents, list of figures, list of the table and index. 


Transverse links which are defined in the text and 
further may be divided into two groups: (a) Window links 
consisting of links whose activation causes the display of 
additional information in an overlapped window without 
changing the current position, (b) Jump links consists of links 
whose activation allows access to cross-referenced 
information in another part of the same book or another book. 


An e-book should utilize the benefits of the Internet by 
providing the abilities to embed multimedia data, to link to 
other electronic resources, and to cross-reference information 
across multiple resources. 


An e-book collection should be accessible anytime, 
anywhere via the Internet and should require no more than a 
personal computer to access the content. An ideal e-book 
should provide content of value, the ability to view online, the 
ability to download to a PC or view off-line, and the ability to 


Documents and Resources in Digital Libraries 279 


view ona handheld device or personal digital assistant. Users 
should be guaranteed privacy for the content they access 
and use and should be able to aggregate and customize items 
and content regardless of format. 


Copy and print capabilities for portions of the e-book 
should be permitted within copyright and fair-use laws. 
Copyright protection must be ensured regardless of whether 
the content is accessed via the Internet or via a downloadable 
reader that allows access to the book off-line. 


A dominant developing model is based on the belief 
that an e-book = content. Therefore, an e-book cannot be a 
device; nor can it be a mechanism of creation; nor can it be 
defined as one dedicated source of content. An e-book is the 
content itself. It is the intellectual property of the author and 
the copyright holders. Based on this premise, the content, 
even in an electronic world, should be available to share 
between and among users, as content produced on paper 
has been and is currently used, while in compliance with fair- 
use and copyright laws. However, the ideal e-book model 
leverages the Internet and the electronic environment to 
provide more efficient and effective means of aggregating, 
organizing, and making content accessible while retaining the 
integrity and essence of the traditional book industry and the 
use of content that is easily accessible and not restricted by 
devices or technical environments. 


E-books have yet to gain global distribution, but the 
publications in electronic form offer substantial advantages 
over paper books in providing aids for connectivity, audio- 
visualization, dynamics, customizability, interactivity and rapid 
information retrieval. It is said that the electronic publications 
are not /pso facto superior to those printed on paper, but they 
are very much more than print-on-paper presented 
electronically, provide greater opportunities for transfer of 
information from one location to another location easily, and 


280 Manual of Digital Libraries 


the information can be automatically searched in order to 
locate items of interest and facilitate better and faster access 
to information through networking. Though, the electronic 
books have several advantages they are not free from 
disadvantages. They involve high initial costs to the publishers 
as well as to the libraries, the non-compatibility of hardware 
and the use of different retrieval software by different 
publishers. The use of electronic media also depends on the 
user-friendly retrieval software. More than as a prerequisite, 
electronic media necessitate the availability of a computer, 
communication network and electronic information access 
facilities. 


4.1.4.2. E-book Formats 


Current e-books are made available in several different 
Standards with their own advantages and disadvantages. 
Besides popular formats such as HTML, RTF and PDF that 
can be read by all e-book readers, following major e-book 
standards are commonly used. 


e Amazon Kindle has his own format, loosely based on 
an older French Mobipocket format 


e Sony uses the ePub format (Open standard for e-books) 
e the Cybooks use an Adobe protected PDF format 


Except for public domain books, an encryption feature 
is used on all e-books. However, a selection of the most 
popular open Standard E-Book File Formats is given below : 


EPUB : Short for electronic publication, EPUB is an 
open standard for e-books created by the International Digital 
Publishing Forum. It is widely supported by publishers, and 
e-book reader hardware manufactures. Additionally, free 
software is available to read EPUB format e-books on most 
computers. The EPUB format can support images, and text 
is reflowable, meaning the text size can be resized without 


Documents and Resources in Digital Libraries 281 


compromizing word wrap around images and at page 
margins. 


PDF : PDF or Portable Document Format, is file format 
created by Adobe Systems. It is a well established open 
standard that can be read by most computers, which is 
supported by most e-book readers. Like EPUB, it can display 
images. However, unlike EPUB, it does not support word 
wrap. 


Plain Text : Plain Text is the “lowest common 
denominator” of open standard text files. It can be read by 
virtually all computers, and most e-book readers. It however 
does not support images, or digital rights management, 
making it a poor choice for publishers who wish to block users 
from making unauthorized copies of their works. 


Besides, there are vairous proprietary formats. A 
selection of the most popular proprietary E-Book File formats 
is given below : 


Kindle : The Kindle file format can only be read by 
Amazon Kindle, Kindle 2, and Kindle DX e-book readers. 
Software is available from Amazon to read the Kindle format 
on PCs, as well as iPhones and iPod Touches. Mac and 
BlackBerry versions of the Kindle software is currently being 
developed. 


e—Reader : e—Reader is a file format created by Palm 
Media. Note that e-Reader, the file format and e-book 
readers— the physical device, sometimes referred to as just 
e-readers, are two different things. The e—Reader format is 
used in conjunction with the EPUB file format by Barnes and 
Noble’s online e-book store. Software is available to view e— 
Reader formatted books on PCs, Macs, iPhones, iPod 
Touches, and BlackBerry. 


BBeB : Short for Broad Band eBook, BBeB was created 
by Sony and Canon for Sony e-book readers. 


282 Manual of Digital Libraries 


Further, e-book file format comparison charts are given 
on next page in Table 4.4. 


Table 4.4. E-Book File Formats Comparison Chart 


Support Support Support Standard 
Plain Text txt No No Yes Yes 
HTML -html No Yes Yes Yes 
PostScript .ps No Yes No Yes 
Portable .pdf Yes Yes No Yes 
Document 
Format 
DjVu -djvu X Yes No Yes 
Fiction Book _ .fb2 Yes Yes Yes Yes 
Mobipocket .prc, .mobi Yes Yes Yes Yes 
Kindle .aZWw Yes Yes Yes No 
eReader .pdb Yes Yes Yes No 
BroadBand .lrf, .Irx Yes Yes Yes No 
eBook 
EPUB .epub Yes Yes Yes Yes 
Tome r2, .tr3 Yes Yes Yes No 
Reader 
ArghosReader .aeh Yes Yes Yes No 
Microsoft lit Yes Yes Yes No 
Reader 


The hardware for an e-book consists of: (i) dedicated 
e-book readers; (ii) PDAs and pocket PCs with book reading 
software; and (iii) Hybrid devices. Apple Newton Message 
Pad was the first dedicated e-book reader developed by Alan 
Kay, who articulated the concept of Dynabook as a portable 
electronic book in 1968 when he was a postgraduate student. 


Documents and Resources in Digital Libraries 283 


Its production was discontinued in 1998 with availability of 
much lighter and leaner Palm Pilots. Rocket e-book, a 
paperback sized device that could hold about 10 books (4,000 
pages of text and graphics) was the first modern e-book 
reading appliance launched by the Nuvomedia in 1998. 
SoftBook from SoftBook Press came as a closest competitor, 
which could hold as many as 250 books (1,00,000 pages). 
The SoftBook comes in leather cover, which, when opened, 
automatically starts-up the book. SoftBook supports functions 
like choosing title, page turning, bookmarking, underlining and 
annotating using touch-screen technology. Both Nuvomedia 
and SoftBook Press were acquired by GemStar eBook Group, 
and the Rocket and SoftBook have been superseded by 
REB1100 and REB1200, produced by RCA through a 
licensing deal with GemStar. The combined projected sale 
for REB1100 and REB1200 is 3 to 7 million units. GoReader 
is another e-book reader designed for university students who 
can obtain textbook contents directly from goReader Website. 


PDAs and Pocket PCs are usually smaller than 
dedicated e-book reader and primarily function as personal 
organizers. Often they also offer Internet access, word 
processing, spreadsheet and MP3 playing capabilities. With 
e-book contents and their viewing software becoming 
available for the PDAs and pocket PCs, they are increasingly 
being used for reading e-books. Palm Reader, MobiPocket 
Reader and Microsoft Reader are some of the e-book reader 
or viewer software. 


There are also Hybrid devices which perform the task 
of dedicated e-book readers as well as that of a PDA and 
pocket PCs besides having a few more added functionalities. 
e-BookMan and MyFriend are such hybrid devices that have 
larger screens intended for reading long streams of text, 
buttons for turning pages and with usual e-book capabilities 
such as bookmarking and annotating. It also has built-in 
capabilities to perform task like e-mail, address book, things 


284 Manual of Digital Libraries 
to do listings, Internet browsing and MP3 playing. 


Today, companies such as Adobe, Mighty Words, 
Everybook, and GemStar are in the business of developing 
technology that would hopefully help in transforming e-books 
into a medium as friendly and ubiquitous as their print 
counterparts. Adobe CoolType is yet another recent 
development that dramatically improves on-screen text 
resolution of digital content. CoolType’s cross-platform, cross- 
font compatibility would be a reassurance for the e-book 
consumers that he would be able to enjoy clearer, crisper 
type and a leading experience that is closer to the clarity of 
printed type. 


A comparison of some e-book reader file format 
compatability is explained in Table 4.5 given on next page. 


4.1.4.3. Digital Rights Management 


Digital Rights Management, also known as DRM, is 
technology that limits the use and duplication of copyrighted 
works. DRM may limit the number of times you may read, 
watch, or listen to a file, what type of device you can use the 
file on, the number of devices you can use the file on, or 
whether or not you can make a copy of the file. 


There are some key points to remember : 


e Most e-books purchased through online stores come 
with some sort of Digital Rights Management (DRM) 
technology. 


e In order for an e-book reader to display an e-book, both 
the e-book file format and DRM must be compatible 
with the e-book reader. 


e DRMrestrictions can be subject to change with little or 
no prior warning. 


285 


tal Libraries 


igi 


Documents and Resources in Di 


ON 
ON 


ON 


ON 


ON 


yooga Japeay 
uadQ əwo; 470M aipulysepeayepeosg nAfq uod -IQON IWLH €Mda Add 


ON 
ON 


ON 


ON 


ON 


ON ON 
ON ON 
ON ON 
SƏəƏA ON 
ON ON 
ON ON 
ON ON 
oN S2A 
ON  SdAQ 


ON 
ON 


ON 


ON 


ON 


Soh 


ON 
ON 


ON 


ON 


ON 


yooga 
pueq 


ON 
SOA 


ON 


SOA 


ON 
ON 


ON 


So, 


SOA 


SOA 


yoog jaxo0d 


ON 
SOA 


ON 


so, 


SoA 


ON 
ON 


SO, 


so, 


so, 


SoA 


SƏA 


SoA 


Soh 


so, 


So, 


SOA 


Soh 


SoA 


so, 


SOA 


SƏA 


SoA 


SoA 


ON 


so, 


so, 


So, 


1x9] 
ule} 


Ki0}S 13AN] 
pel! xou! 


Jepeal 
3SIM uoAueH 


EA Jepeay 
-Ə UIJUCH 


dISSE|D 
Y3-1009 


JOON PIGON 
9 souleg 


JƏpeə1 
JISIM eyooqzy 


ped! ajddy 


xd ‘2 
a|puly uozewy 


1əpeəy-3 


a eee eee 


yeyo Ayiqnedwoy ewo all Japeay 4009-4 “Gy 919e L 


Manual of Digital Libraries 


286 


(eipadiyim : 82NN0S) 


clLOdsA 
ON ON ON ON ON ON ON ON sah SeaA sad SƏA sa, DIUOSMel/\ 
ON ON ON ON ON So, ON ON ON ON seh Sa sar Japeay Auos 
Snid LOE 
ON ON ON ON ON ON SoA ON SoA SoA SoA so, SoA yooq}e}90d 


ON ON ON ON ON ON S®A SƏN seh SseA Sad SeaA_ sar 09 xoog xAuQ 
ON ON ON ON ON ON ON ON ON ON S8A S89A SBA z YOO LNNN 


SoA ON ON ON ON ON ON  SƏA SƏA SƏA SƏA SƏA say OO6N EHON 
yooga 

yooga Japeay pueq yoog yex90d Ka] 

uadQ aWOl 410M 2aipulysepeayepeoig NAfA uondI4 -IQON IWLH GNda 4dd Ued Japeay-3 


a nn a a a ee 


ae een 


Documents and Resources in Digital Libraries 237 
A selection of popular DRM Technologies in given below: 


ADEPT : Created by Adobe, ADEPT, or Adobe Digital 
Editions Protection Technology, allows users to download an 
e-book on to six devices at one time. E-books that are 
available through the Monroe County Library System and 
Overdrive are protected by Adobe ADEPT DRM. 


Amazon DRM : E-books sold through Amazon’s online 
store are protected by their proprietary DRM. Most books 
can be downloaded to six devices at a time, although for some 
titles that limit is less. 


FairPlay : E-books sold through Apple’s iBookstore 
incorporate their FairPlay DRM. 


Barnes and Noble DRM : E-books sold through Barnes 
and Noble’s online bookstore incorporate their proprietary 
DRM. E-books protected by Barnes and Noble DRM can be 
downloaded to one device at a time. However, Barnes and 
Noble e-books can be lent to another person for up to 14 
days. 


A comparision of some e-book readers is made in Table 
4.6 with DRM facilities. 


Table 4.6. Some E-Book Reader with DRM Facilities 


A E o aaamooo 
E-Book Reader Adobe Apple Amazon Barnes and 
ADEPT FairPlay DRM Noble DRM 


Barnes and Noble Nook Yes No No Yes 
Amazon Kindle No No Yes No 
Apple iPad No Yes Yes Yes 
Sony Reader Daily Edition Yes No No No 
Sony Reader Pocket Edition Yes No No No 
Sony Reader PRS-505 Yes No No No 
Sony Reader PRS-700 Yes No No No 
Sony Reader Touch Edition Yes No No No 
Windows PC Yes No Yes Yes 


Apple Macintosh Yes No Yes Yes 


288 Manual of Digital Libraries 
4.1.4.4. E-Book Reader Software 


Virtually all e-book file formats can be read on a 
computer once the proper software is downloaded. Many 
people may be unaware that they have already own devices 
(computers, smart phones, PDAs) that can make use of e- 
books. There are many free e-book reader programs available 
for computers and smart phones. 


Some of the e-book readers., formats and e-book 
reader software for smart planes are given below: 


Selection of E-Book Reader Software for Computers 


Adobe Digital Editions 

http://www. adobe.com/products/digitaleditions/ 
Formats: PDF, EPUB, XHTML PC and Mac 
Lexcycle Stanza 


Attp.//www.lexcycle.com/download 

Formats: PDF, eReader, EPUB, Kindle, Mobipocket, 
HTML, Microsoft Reader, FictionBook, plain text, and more 
PC and Mac 


Barnes and Noble eReader 

http://www. barnesandnoble. com/ebooks/index. asp 
Formats: EPUB, eReader PC and Mac 

Amazon Kindle 


Attp://www.amazon.com/gp/feature.htrni/ref=kcp_ pc 
mkt Ind?docld=1000476311 


Formats: Kindle, PC, Mac 
Selection of E-Book Reader Software for Smart Phones 


Lexcycle Stanza 


http://www.lexcycle.com/ 
Formats: PDF, eReader, EPUB, Kindle, Mobipocket, 
HTML, Microsoft Reader, FictionBook, plain text, and more 


iPhone 


Documents and Resources in Digital Libraries 289 


Barnes and Noble eReader 


http://www. barnesandnoble. com/ebooks/index. asp 
Formats. EPUB, eReader, iPhone, BlackBerry 


Amazon Kindle 

Attp://www.amazon.com/gp/reature.htmi/ref=kcp pc 
mkt Ind?docld= 1000426311 

Formats: Kindle, iPhone, BlackBerry 


Aldiko 


Attp://www.aldiko.com/ 
Formats. EPUB, Android 


Fig. 4.1. E-Books and E-Book Readers 


4.1.4.5. Free E-Book Resources 


Virtually all works of popular classic literature are in the 
public domain, and can be downloaded free of charge and 
without any DRM restrictions. Some examples are given 
below. 


g xnu 009-SHd uonipa 
8 A SƏA eISIAg&JuUoON ě 8 9 6002 yono Japuay Auos 
S A oN 8 9 600 Korg Janu 
~ 
2 (uaasos xnu! 
N A A A woyog)seA p /Pjospuy 9L 9 6002 YOON SIGON 3 Seueg 
8 xnu 006-SHd 
% A SOA EISIABIUOW 94 4 6002 uomp3 Ajleq sapeay Kuog 
N 
Q (wiy'g9} L3bed/ 
> Soy xepul wood'elpule}o 
jeiseelpul “MMM//:dqU) 
e}0'mwwy:dyy) səƏepossy 
N N S 9'z xnu v 9 6002 JBAUD9 ABojouyoa | 10pu0D 
(woo-uapeas 
eoep mmwyy:dyy) 
A xnu! 8 9 0102 yoeqiaded Hqw eve} 
(woo'elpul 
wy syonpoid ep mww:dyy) 
/S8\ xepur/ sajyeloossyy 
(yano; eiseelpuleyoy/:dyy) ABojouyoə L 
A A IN4)SƏA 8 9'z xnu v 9 0102 yono Jənugə Jopuog 
a D r ara UUO itt ħi 
uəƏðlðs Sewo} jo WƏZSÁŞS (uj) 1eəA 
nalp’ mze: Gle: gpd’ qndə' yım yono, ONO Bunesedg sapeys əzış ouj IBƏPOW eaxeW 


————————_——_ —_ oq qj 


siaunjoejnuew Aq pjos Ə91ƏQ “/“p ajqe] 


290 


291 


Documents and Resources in Digital Libraries 


A Zh xnuly 9} 9 6002 z aIPuly uozewy 
A zt «xu a «GZ Xa eIpuDy uozewy 
8 S 6002 smuAded Bunswes 
A 8 9 6002 Y37009 pease}u] 
(auoj9 £A uuep) 
À Y 9 aud yoogəg seəpı ssəjpu3 
(əuo GA uijuep) 
A ON 8 S IU 400geq seep! Ssajpuy 
A y 9 -600Z yooge x@Uuo|y 
A xnu y S 6002 sndo yooqh9 ueexoog 
xnuly 00€-Sdd 
A € EIsIAeUoW! 8 G 600z uonp3 8490dq Jepesy Kuos 
A A ON xnu! QL 9 LOE YOORIS}490dq 4009}9}490q 
A Sep 9°@ xnu! 8 S$ 6002 OZSN WOME 
A S8A gre xnu 8 S 6002 9ZSN UOMe}H 
S8A 9z xnu 8 G 6002 8LSN uoAJeH 
9'z xnu! 8 S$ 6002 OLSN uonueH 
uəəƏs sewo} jo WƏJSÁS (uj) sea, 
nap’ mze’ Bye’ gpd: qnda yım yano oN@o, Buneiado sapeys azig oul jepow exeW 


ital Libraries 


igi 


i 


Manual of D. 


292 


x 


(sı) əy} ul Buissiw eq Aew SQOZ 0} s066, 3U} WO SedIAep Japeas-y UONeJaUab puz pue js} ay} Jo Aue; 370N) 
on ee 


SƏA 


A SƏA 
A 


xnurq 
BISIAe]UO/ 


6L 


xnu 

EISIAU 
xnu 

EJSIABIUON 


Lá 
9L 


SN Nt ow c0 


8 
t 


9 
l'8 


Oo O© O© O © 


9 01-8002 


9 


s00¢ 
9002 
9002 


900Z 
L002 
L002 
L002 
L002 


8002 


8002 


6002 


auqn Auos 
pen! səlojouy2ə XƏH! 
ZA UllueH Suit 
00S-Sud Jepeey Auos 
£A UllueH oyu 
gua yooqÁo uəəyoog 
a|puly uozewy 
£A UljueH əyuıf 
0001 
Jopeay jevibiq ABojouyse) xey! 
S0S-Sud JepeeY Kuos 
00Z-Sud 13peəY Auos 
AUISe SIEMYOS XO 


a TT TIES OE a a 


nalp: mze" 


6ie™ 


uealos 
qpd: qnda- yım yənoL 


sewo} JO wWayskS 


(ui) 


Jeo, 


ON 18101 6Buyeiedg sepeys əzış onul 


SO i or rrr 


l@POW ayey 


Documents and Resources in Digital Libraries 293 
ManyBooks 


Attp://manybooks.neV/ 


Formats: EPUB, Kindle, Microsoft Reader, Sony 
Reader, Mobipocket, Rocket, HTML, RTF, plain text 


Genres. Fiction and non-fiction 

Project Gutenberg 

http://www. qgutenberg. org/wiki/Main Page 

Formats: EPUB, Mobipocket, HTML, plain text 
Genres. Out-of-copyright general fiction and non-fiction 
Internet Archive 

http://www. archive. org/details/texts 

Formats: EPUB, PDF, Kindle, HTML, plain text 
Genres. Out-of-copyright general fiction and non-fiction 
Wikibooks 

http://en.wikibooks.org/wiki/Main Page 

Formats: PDF, Open Document 

Genres: Textbooks and how-to manuals 

Mobipocket 

http://www.mobipocket. com/treebooks/default.aspx 
Formats: Mobipocket 

Genres: General fiction and non-fiction 

Baen Free Library 


http.//www.baen.com/ibrary/defaultTitles.htm 


Formats: EPUB, Kindle, Microsoft Reader, Sony 
Reader, Mobipocket, Rocket, HTML, RTF 


Genres. Fantasy and science-fiction 


294 Manual of Digital Libraries 


Planet eBook 


http://www. planetebook.corm/ 
Formats: PDF 


Genres: Out-of-copyright general fiction 


4.1.4.6. E-book Challenges 


Since 1970s, the development of electronic versions 
of printed books (e-books) has been as a part of the whole e- 
publishing phenomenon. The integration of e-books into the 
digital library has created not only opportunities for librarians 
but also several challenges. Full-text access and retrieval of 
e-books combine library-based theories and principles with 
web search and retrieval techniques. Librarians must develop 
innovative policies, procedures, and technologies to 
accommodate the publication of and access to e-books. 


E-book challenges for librarians can be grouped into 
three categories—acquisitions and collection development, 
standards and technology, and access. Within each of these 
categories are subcategories. Challenges for acquisitions and 
collection development include budget allocations, usage and 
distribution models, purchase models, and collection- 
development strategies. Challenges related to standards and 
technology include not only catalouging and metadata 
standards but also a book hardare and software each for 
technologies, digital rights management software, and user 
and staff training. Challenges associated with access include 
the cataloging and indexing of e-books, circulation models 
for the electronic environment, and preservation and archiving 
of e-books and the resources linked to them. 


Publishers must also contend with challenges created 
by the emergence of the e-book. Since the Internet knows 
no boundaries, these include securing both electronic and 
territorial contractual rights for content and permission 


Documents and Resources in Digital Libraries 295 


clearance. Publishers must become involved in the 
development of format identifiers, such as iSBNs, digital 
object identifiers (DOIs), and so forth. E-book metadata 
maintenance and delivery and compositor and e-book file 
delivery are new areas that require publishers to invest 
additional resources. Editorial and production workload, 
quality assurance, and sales reporting and accounting, 
including accounting for royalties from electronic content, 
require publishers to revise policies and procedures, to hire 
personnel with related knowledge and skills, and to train 
personnel in this new publishing area. Publishers must also 
develop methods for the storage and transmittal of e-book 
files for repurposing content. The marketing for and the 
publicity and sales integration of e-books also require 
publishers to revise current practices or to develop new 
practices. 


In spite of these challenges, progress has been made 
in the production and distribution of e-books. Librarians, 
publishers, e-book providers, and vendors of integrated library 
systems have worked together to implement and integrate 
acquisitions systems; test various collection development 
strategies; propose and adopt new, revised, and combined 
standards; provide new e-book hardware and software; 
identify and test new indexing and retrieval methods for full- 
text e-books; test new access and usage models; and initiate 
preservation and perpetual access agreements for e-books. 
great progress has been made in providing, distributing, 
accessing, and retrieving e-books, and several models have 
emerged. 


4.1.4.7. Ideal e-book Model 


The relationships with publishers are the keys to 
ensuring a steady flow of vetted content. An e-book provider 
should make available content from many publishers, allowing 
access to an additional distribution channel for publishers’ 


296 Manual of Digital Libraries 


products. The contracted publishers should adequately 
represent academic, commercial and trade publishers. 


The one-book-to-one-user model allows only one 
person to access each title at one time. Publishers feel 
comfortable with this model, believing that their content, 
available on paper, will not be cannibalized in an electronic 
environment. Some publishers have invested in e-book 
content companies, both through outside providers and within 
their own organizations, and therefore have a vested interest 
in providing an effective e-book model. 


Quality content is one of the key factors in providing an 
effective e-book model, and publishers are instrumental in 
identifying the content that will be available electronically. A 
well-positioned e-book provider will have thousands of titles 
available that are identified and targeted for academic, public, 
school and corporate library collections. Libraries should have 
on staff librarians who have subject-area expertize in 
collection development as well as individuals from the 
publishing industry who are familiar with publishers’ areas of 
specialization. Available e-book collections should be focused 
in areas reflecting both the activity in the publishing market 
and the areas of high user interest. They should contain titles 
with current imprint dates as well as classic titles freely 
available in the public domain. 


The ideal e-book model will allow users to copy and 
print portions of content while complying with copyright and 
fair-use laws. Copyright compliance is of great importance to 
publishers since they are obligated to protect the intellectual 
property of their authors. The model should provide the secure 
rendering of digital content on-site, via web browsers, and 
via downloadable readers. Publishers must be confident in 
the e-book provider's digital-rights-management software and 
assured that dissemination of their content is secure. 


The delivery and distribution of e-book content should 


Documents and Resources in Digital Libraries 


297 
be customizable to meet each library’s needs. E-books are 
one of a library’s significant assets and should be platform 
independent; accessible worldwide, online — via a web 
browser, or off-line via an e-book reader: and capable of 
integration into the library’s online public access catalogue 
(OPAC) through MARC records provided directly through the 
e-book provider or a bibliographic utility (e.g., OCLC, RLIN, 
RLG). Management of content whether paper or electronic, 
is Critical to librarians’ collection development, budget, user 
services, and circulation decision-making processes. The 
model e-book vendor should provide usage reports as well 
as reports of titles that are not used, thus enabling librarians 
to monitor and adjust their collection strategies and circulation 
models. The e-book provider should make it possible to assign 
circulation periods by title and/or collections and should 
develop and offer collection-development tools for reviewing 
and acquiring new content. 


In this model, the e-book provider should offer customer 
services such as technical support, training, collection- 
development assistance, and marketing services. Technical 
support should be available to set up access to collections 
and management reports and to assist with MARC record 
integration. Both on-site and online training should be offered 
to library staff in addition to training and user documentation. 


An e-book provider should supply published e-book 
content to academic, public, school, and corporate libraries 
both directly and through distributors to accommodate 
libraries’ current acquisition processes. Some distributors that 
are currently cooperating with e-book providers in distribution 
agreements include Blackwell’s, Follett Corporation, EBSCO 
Information Services, Baker and Taylor, J. A. Majors, Coutts 
Library Services (including BMBC Limited in the United 
Kingdom), Teldan Information Systems in Israel, and 
Bibliotekstjanst AB in Sweden. The e-book distributor should 
have experience with the international market and provide 


298 Manual of Digital Libraries 
content to library customers throughout the world. 


The model e-book distributor should make available an 
e-book MARC record for each offered title. Library customers 
should be able to acquire these records directly through the 
e-book provider or through a bibliographic utility. The model 
e-book distributor should have alliances with vendors of 
integrated library systems, such as Innovative Interfaces, 
SIRSI, and Follett Software Company, enabling librarians to 
incorporate e-book titles into their paper book collections. This 
allows a seamless interface for users and facilitates their 
access to e-book content. 


The model e-book provider may employ professional 
librarians who are available to collaborate with libraries’ 
collection-development staffs and to assist with the creation 
of MARC records for all e-book titles. The e-book distributor’s 
marketing team should provide promotional materials to 
librarians, the libraries users, and publishers. These services 
provide the conduit between library customers and the 
publishers of the available e-book contents. 


4.1.4.8. Future Directions 


Librarians must think beyond the paper book and utilize 
the capabilities of the e-book, which is more than an 
alternative to a paper book. Librarians should not make the 
mistake that was made when moving the paper card 
catalogue to the online environment—simply digitizing the 
catalogue card without considering the new possibilities for 
search and retrieval. Links from the e-book to dictionaries, 
thesauri, related images, photographs, electronic text, and 
audio and video segments should be incorporated. 


Now is also the time to enhance the bibliographic record. 
The table of contents and book indexes should be included 
in the bibliographic record since these are already digitized 
in the e-book format. Links to book reviews, electronic 


Documents and Resources in Digital Libraries 299 


resources that are referenced in the book, and book 
summaries should also be included in the bibliographic record. 
Librarians need to work with publishers, technology providers, 
and e-book providers not only to map standards and schemes, 
such as the Dublin Core and ONIX, but also to integrate these 
into the MARC format. Full-text search capabilities of e-books 
should be integrated into our library online public access 
catalogues to enable users to search within the library’s 
electronic collection as well as across other available 
electronic collections. An example for libraries moving in this 

direction is CORC, which enables users to search across all 

types of electronic information, such as websites, electronic 

journals, e-books, newspapers, advertisements, and so forth. 

Library systems should also enable the integration of semantic 

searches that map and retrieve concepts and ideas in addition 

to keyword and known searches. 


These advances will move libraries into the digital world 
of our users. With the advancement of wireless technologies, 
library users’ expectations are changing as they become more 
knowledgeable about and more dependent upon technology. 
E-cars, high-tech automobiles with Internet access, allow 
individuals to check e-mail, monitor stocks, and keep up with 
sports scores without taking their hands off the steering wheel 
through the use of telematics, a wireless technology that 
transmits information to and from a vehicle. 


Users now have the capability to aggregate their 
electronic content into private digital libraries. Peer-to-peer 
technology that allows all types of files to be shared between 
individuals is facilitating this aggregation. But the question 
arises, if individuals are aggregating content to create their 
own information stores, will libraries and librarians become 
obsolete? The literature indicates that librarians will be needed 
to assist individual users with the retrieval and evaluation of 
electronic information. John Lombardi, speaking at the Annual 
Conference of the American Library Association in July 2000, 


t 


300 Manual of Digital Libraries 


suggested that the role of the librarian as gatekeeper will 
change as individuals become their own gatekeepers. He 
believes that librarians will digitize unique special collections 
and maintain and manage these collections. He also envisions 
librarians uniting to create a “mega” library catalogue and 
developing library portals to compete against commercial 
services. 


Now a days, Digital Audio Books are also available 
which tells the information embeded or encoded there in by 
speaking. In early November 2004 OverDrive began offering 
downloadable digital audiobook services to libraries. 
Unabridged, a digital audiobook delivery service for the blind 
(www.unabridged.info), was one of OverDrive’s first 
customers, culminating well over a year of cooperative 
discussions and trials with representatives of the blind 
community on how to make the new service accessible to all. 
Unabridged is a self-funded initiative that offers hundreds of 
downloadable digital audiobooks to eligible print-impaired 
users in five states in USA using it for in-house testing and 
evaluation as it prepares to launch its national digital 
audiobook service. 


The Unabridged team deliberately chose a small, soft 
launch for the service to ensure that it truly met the needs of 
the growing number of computer-savvy print-impaired library 
users who are anxious to access and enjoy downloadable 
digital audiobooks supplied by libraries and talking-book 
centers. The OverDrive team was keenly interested in making 
their new system accessible to all. They incorporated text- 
only and audio instructions into their Heip system, offered 
keystroke alternatives for core commands, and designed into 
their system key functionalities, such as variable-speed 
playback, that are heavily used and appreciated by print- 
impaired readers. 


The Daisy format not only links the text to the audio 


Documents and Resources in Digital Libraries 301 


version of the same book but permits also an extensive tree- 
like navigation through the document. This results in easy 
jumping to parts of the book, including up to the sixth level in 
a table of contents. Further more Daisy permits to produce 
several kinds of e-books such as text-only, audio-only, text & 
audio, text & images etc. Daisy is promoted by the Daisy 
consortium and their standards are nowadays recognized by 
international bodies. 


Daisy books technically consist of a collection of 
computer files that can be stored ona CD and are very popular 
as it can be read with a portable CD player but also on any 
other computer medium including SD-cards — popular for 
pocket size readers, and simply via the internet. Despite a 
decennium of efforts the Daisy standard is still not in use 
outside the field of print impaired users. To make it more 
popular several open-source software solutions have been 
developed. So it is possible to produce a Daisy talking book 
directly from within Microsoft Word. Within the European 
Aegis project three add-ons for OpenOffice.org (file extension: 
.odt) have been developed. 


— an odtto Daisy convertor 

— an odtto Daisy talking book convertor 

— an odtto Braille convertor - still under development 
The Daisy Consortium itself focuses on : 


— DAISY Online Delivery Protocol: This is basically a 
definition of SOAP messages permitting easy web- 
based content provision. 


— Daisy version 4.0: This standard will permit an easier 
transfer of e-books and talking books to the ePub 
format. 


— Copyright protection for Daisy books. 


Up to now such protection was deemed unnecessary 


302 Manual of Digital Libraries 


as special equipment or software was needed to read a Daisy 
book. 


Besides, MI-DTB (Mid-Illinois Digital Talking Book) 
Project (www.midtb.org), funded with an ALA Leader in 
Library Technology Grant from the Sirsi Corporation, is also 
confronting the dilemma between mainstream e-books and 
their separate-but-similar counterparts aimed at visually 
impaired users. While numerous systems, software programs, 
and hardware devices have been designed specifically for 
use by visually impaired people, consumer-oriented audio e- 
book services are gaining wide acceptance by the general 
population. Audible.com is very popular among commuters, 
joggers, cyclists, mall walkers, and others who want to listen 
to spoken-word content on the move. OverDrive, netLibrary, 
and other e-book companies have developed downloadable 
digital audiobook programs. 


Dozens of volunteers from around the nation have 
participated in MI-DTB. They relish the opportunity to try digital 
audiobooks in various formats and players, designed either 
specifically for the visually impaired or for the mainstream 
consumer market. Digital audiobooks in different formats are 
making more materials accessible for the visually impaired 
and making reading opportunities more flexible for the sighted. 
Users can listen to books on their computers; play them in a 
recorded or text-to-speech voice; move the file to a portable 
device of their own; or burn the file to a CD, if permitted by 
the access agreement. 


The work being done under the auspices of the MI- 
DTB Project undoubtedly affect to some degree future 
developments in the digital talking-book field, including 
systems designed both for the blind and visually impaired 
and for general consumer services. The project itself has 
heightened vendor awareness of visually impaired people as 
a population that will purchase digital players and content 


Documents and Resources in Digital Libraries 303 
and will provide suggestions about functionality and usability. 


Besides, the portable music devices like the iPod—have 
created a world where the individual’s control over the content, 
style, and timing of what he consumes is nearly absolute. 
Retailers and purveyors of entertainment increasingly know 
the buying history and the vagaries of its unique tastes. As 
consumers, it is expected that our television, our music, Our 
movies, and our books on demand. We have created and 
embraced technologies that enable to make a fetish of our 
preferences. 


The long-term effect of this thoroughly individualized, 
highly technologized culture on literacy, engaged political 
debate, the appreciation of art, thoughtful criticism, and taste 
formation is difficult to discern. But it is worth exploring how 
the most powerful of these technologies have already 
succeeded in changing our habits and our pursuits. By giving 
us the illusion of perfect control, these technologies risk 
making us incapable of ever being surprized. They encourage 
not the cultivation of taste but the numbing repetition of fetish. 
And they contribute to what might be called egocasting, the 
thoroughly personalized and extremely narrow pursuit of one’s 
personal taste. 


Today, the iPod—the portable MP3 player that can store 
thousands of downloaded songs—is our modern musical 
phylactery. Like those little boxes containing scripture that 
Orthodox Jewish men wear on the left arm and forehead 
during prayers, the iPod has become a nearly sacred symbol 
of status in certain communities. Introduced by Apple 
Computer only a few short years ago, the iPod is marketed 
as the technology of the dis-connected individual, rocking out 
to his headphones, lost in his own world. 


iPod, and other technologies of personalization are 
conditioning us to be the kind of consumers who are, as 
Joseph Wood Krutch warned long ago, “incapable of anything 


304 . Manual of Digital Libraries 


except habit and prejudice,” with our needs always 
preemptively satisfied.Cass Sunstein, in his book 
Republic.com, argues that our technologies—especially the 
Internet—are encouraging group polarization: “As the 
customization of our communications universe increases, 
society is in danger of fragmenting, shared communities in 
danger of dissolving.” Borrowing the idea from MIT 
technologist Nicholas Negroponte, Sunstein describes a world 
where “you need not come across topics and views that you 
have not sought out. Without any difficulty, you are able to 
see exactly what you want to see, no more and no less.” 
Sunstein is concerned about the possible negative effects 
this will have on deliberative democratic discourse, and he 
urges website authors to include links to sites that carry 
alternative views. Although his solutions bear a trace of 
impractical, ivory-tower earnestness—you can lead a rabid 
partisan to water, after all, but you can not make him drink— 
his diagnosis of the problem is compelling. “People should 
be exposed to materials that they would not have chosen in 
advance,” he notes. “Unplanned, unanticipated encounters 
are central to democracy itself.” 


Sunstein’s insights have lessons beyond politics. If 
these technologies facilitate polarization in politics, what 
influence are they exerting over art, literature, and music? In 
our haste to find the quickest, most convenient, and most 
easily individualized way of getting what we want, are we 
creating eclectic personal theaters or sophisticated echo 
chambers? Are we promoting a creative individualism or a 
narrow individualism? An expansion of choices or a deadening 
of taste? 


It we talk about its control freaks, iPods will never 
destroy us. But our romance with technologies of 
personalization has partially fulfilled Krutch’s prediction. We 
have not become more like machines. We have made the 
machines more like us. In the process we are encouraging 


Documents and Resources in Digital Libraries 305 


the flourishing of some of our less attractive human 
tendencies: for passive spectacle; for constant, escapist 
fantasy; for excesses of consumption. These impulses are 
age-old, of course, but they are now fantastically easy to 
satisfy. Instead of attending a bearbaiting, we can TiVo the 
wrestling match. From the remote contro! to iPod, we have 
crafted technologies that are superbly capable of giving us 
what we want. Our pleasure at exercising control over what 
we hear, what we see, and what we read is not intrinsically 
dangerous. But an unwillingness to recognize the potential 
excesses of this power— egocasting, fetishization, a vast 
cultural impatience, and the triumph of individual choice over 
all critical standards—is perilous indeed. 


Indeed, iPods are penetrating the larger educational 
world. Drexel University, Philadelphia, according to the 
Chronicle of Higher Education, is providing education students 
with iPods in an experiment to “evaluate the educational 
potential of the devices” and will even test audio blogging 
and podcasts of lectures. Besides, some libraries are 
circulating iPods to enhance and improve access to library 
services. The Duke Divinity School Library, Durham, North 
Carolina, has launched a project that puts audio instructions 
for using two electronic tools (Bibleworks and the ATLA 
Reigions Database) and for navigating the print exegesis tools 
in the reference room. Librarians like the iPod feature that 
alters playback speed because it enables time-starved 
students to listen to a lecture at a faster rate. “Conversely, 
the students who work with English as a second language 
can slow things down.” 


Libraries have been circulating audio players for years 
in foreign countries. Kalamazoo Public Library, Michigan, 
began an audio program with Audible.com in 2002. King 
County Library System, Washington, circulates Rio500 
players. Participants in the Listenlllinois and ListenOhio 
projects circulate Otis players and other similar MP3 devices. 


306 Manual of Digital Libraries 


“The iPod is a hip, ingenious product. iPod is the Beatles 
right now—we chose the right product,” Latini says, and Weil 
agrees— “People know what the iPod is, other brands are not 
known as well. Some people do not know what an MP3 is.” 
Beyond the trendiness factor, some believe it is a simple, 
cost-effective solution. “Duke has bought into iPods in a big 
way,” Keck states, “In many ways, it is easier and cheaper 
for the library to loan a few iPods loaded with the licensed 
and home-grown content than for every student to have an 
iPod for which content must be separately licensed and 
loaded.” Winnacunnet High’s Grazier agrees and praises the 
Shuffle, which is “less expensive and thus cheaper for us to 
purchase. It is also cheap enough to hold a student 
accountable for if it is lost or damaged.” 


But there are obstacles to deal with when considering 
the devices for libraries. Keck relates that the librarians at 
Duke quickly decided that checking out the Apple ear buds 
was probably not very sanitary and could actually discourage 
use. “We had some old clunky media-center headphones, 
but our student workers laughed so hard when they saw the 
giant headphones and the smaller iPods,” Keck says, “that 
we had to purchase smaller, cooler headphones.” Weil 
recognizes that such cutting-edge innovation “would not be 
for everybody. It would take time to adjust to new technology.” 
And what if the Shuffle is returned to the library blank or filled 
with other content? “Someone could erase it, sure,” Weil says. 
“But it is easy to correct—just plug it in and reload it with the 
audiobook files.” 


Jerry Kuntz, electronic resources consultant at the 
Ramapo-Catskill Library System, Middletown, New York, 
recently commented on web4lib that circulating iPods “is a 
great service, but one that it is not scalable to larger libraries 
because of the staff time needed: Staff must download the 
titles from the library’s iTunes account themselves... to a 
library PC and then transfer the files to library-owned iPods.” 


Documents and Resources in Digital Libraries 307 


Other tasks come into play as well: taking deposits, cleaning 
headphones, and the like. “There is no way a larger library— 
or even a small library with tight staffing—can support this 
service model.” 


Interoperability is another issue. Kuntz and members 
of NYLINE, the New York State Library e-mail discussion 
group, are lobbying to get Apple to create partnerships with 
the digital audiobook companies already in the library market, 
like OverDrive or Recorded Books, that currently do not 
support the iPod. 


The relationship between the iPod and libraries is off 
and running. All it needs is more librarians recognizing more 
uses for the devices. An art library might circulate an iPod 
Photo with digitized images to support an art history course. 
With the included cable, the artwork could be reviewed on 
practically any television. Could libraries also give users a 
chance to load a circulating iPod via iTunes in the library? 
Talk about user-centered: “Here is an iPod Shuffle and a 
library of 100 songs; fill it with what you would like to hear.” 
Whatever happens, this seems like a match made in heaven. 
Winnacunnet High’s Grazier puts it simply. “iPods and libraries 
are both really cool,” 


But Libraries get podcasting. The latest tool librarians 
are using to market the library is podcasting. David Free, 
Reference Librarian at Georgia Perimeter College, Decatur, 
has been experimenting with the format, producing a new 
show every two weeks. He says he first experienced 
podcasting as a consumer, “Then | began to wonder what 
the library could do.” Discussions about podcasting on several 
library blogs motivated him as well. Podcasting, Free believes, 
has huge potential, especially with institutions that have a 
large 18- to 23-year-old user base. “They are used to 
electronic media and want content provided this way,” he 
says. Free designs 30-minute programs that students can 


308 Manual of Digital Libraries 


download and then listen to at their leisure. The programs 
are available from the library's blog. Free uses shareware 
audio recording and says that other than a decent-quality 
microphone and a web directory to house the file—and, of 
course, Staff time— there were no costs associated with 
producing the shows. No word yet on the program’s popularity, 
but Free was waiting until he had some experience before 
widely promoting it. 

Whatever the case may be, but libraries are taking 
interest in such types of new means. 


4.2. ONLINE OR VIRTUAL RESOURCES 


Online or Virtual resources mean an electronic access 
system that provides users with the means for discovering 
and retrieving information online. The information may be 
retrieved directly from within the hosting system or through 
links to other external systems. The intent is to offer the same 
types of resources and services that are available in a brick 
and mortar library in digital form. 

Now a days, online resources are becoming very 
popular as through them information retrieved quickly, 
accurately and fastly. Online resources on the Internet 
manifest themselves in numerous flavours and categories. 
Although most of them emulate the traditional publishing while 
others are revolutionary in their design and approach. While 
the present trend to imitate and emulate the traditional models 
of scholarly communication may continue for some time, 
eventually the capabilities added by the new media would be 
used in more innovative ways. National Centre for Scientific 
Information (NCSI) has categorize the information resources 
available via the Internet as discussed below. 


4.2.1. Primary Sources of Information 


Various types of primary sources of information are 
described below. 


Documents and Resources in Digital Libraries 309 


Electronic Conferences : Electronic conferences, 
variably known as electronic forums, electronic user-group, 
listservs, discussion groups, are important resources for 
researchers and scholars in every discipline. New scholars 
in particular get an opportunity to discover what topics are 
being discussed in their field, to learn who are involved in 
these discussions, and to make themselves known within their 
discipline by their own contributions. Calls for papers and other 
professional announcements often first appear in electronic 
forums. It is now commonplace for scholars to “meet” through 
electronic conferences and forge collaborative relationships. 
Listserv conferences are accessed by subscription. The lists 
are handled by software programs such as “listserv” 
“comserve” and “listproc” that distribute messages from any 
member(s) to the whole group. The conferences work by the 
distribution of information via e-mail, but these are not live- 
time conferences. Subscribers receive messages, known as 
postings, in their e-mail. 


Each list has the following two addresses, that is :— 


—  _Listname address that is used to send messages to all 
members of the list. For example, the listhame address 


for digital libraries is:diglib@infoserv.nlc-bnc.ca. 


—  Hostaddress that is used for subscribing, unsubscribing 
and other businesses. For example, host address for 


digital libraries is: listserv@infoserv.nlc-bne.ca. 


To subscribe to a listserv conference, one generally 
sends an e-mail message to the computer that maintains the 
subscription list. For example, to subscribe to a listserv 
conference called CDS-ISIS, an electronic listserv dedicated 
to the discussions on Unesco’s CDS/ISIS text retrieval 
program, an e-mail message can be sent to the following 


address: 
To:LISTSERV@NIC.SURFNET.NL 


310 Manual of Digital Libraries 
Cc: [leave this field blank] 
Subject: [leave this field blank] 


SUBSCRIBE CDS-ISIS [your first name] [your last 
name] 


Note that the subscription message is not sent to the 
conference itself, but to the computer software address that 
maintains the list. After subscribing, the subscriber will receive 
a message that will include detailed instructions, commands 
and options that are required to maintain the subscription, as 
well as information on how to unsubscribe the listserv. There 
are more than 18,000 active listservs. The following Internet- 
based directories can be used to locate listservs in a given 
discipline: 

— Directory of Scholarly and Professional E-Conferences 

(http://www.kovacs.com/directory.html) 


— Tile Net 
(http://tile.net/) 

— Publicly Accessible Mailing Lists 
(http://paml.alastra.com/) 

— Topica: Tool for Finding Discussion Lists 
(http://www.topica.com/) 

— Google Groups 
(http://groups.google.com/) 


— Yahoo Groups 
(http://groups.yahoo.com/) 


Courseware / Tutorials / Guides / Manuals : The web- 
based educational tutorials or guides called online courseware 
that provide higher degree of interactivity, flexibility and benefit 
of self-pace to the users. The courseware available on the 
Internet varies to great extent, in terms of their coverage and 
quality, from provision of basic lecture notes and lecture 


Documents and Resources in Digital Libraries 311 


support material to integrated and highly interactive tutorial 
packages. The online courseware are in the forefront of 
technological, multimedia and instructional innovation 
designed to provide computer-based training to users over 
the Internet. Some of these courseware are comprehensive 
resource kits focused on developing practical skills that can 
be applied immediately. They are amongst the electronic 
resources created exclusively for the web, imbibing all 
features and facilities offered by the new technology. The 
courseware are proliferating the web as strong contenders 
for distant education. Telecampus, Canada 
(www.telecampus.edu/) lists more than 12,000 online 
courseware available on the web. Institutions of higher 
learning, especially distant and continuing education 
departments are actively supporting and contributing to the 
development and implementation of computer-assisted 
instructions and multimedia courseware. A portal site on 
“Web-based Online Interactive Courseware in Information 
Technology” is available at Attp:/www.litd.ac.in/ courses/ 
under a project sponsored by the Ministry of Information 
Technology. The site has 4,000 courseware including 375 
courseware in public domain. As a mirror site for Central 
Institute of Technology, New Zealand, the site includes around 
25 courseware published by them. 


Electronic Journals on Web : Electronic journals, or 
“e-journals”, are used for those journals and newsletters that 
are prepared and distributed electronically. Electronic journals 
may be defined very broadly as any journal, magazine, e- 
zine, webzine, newsletter or type of electronic serial 
publication which is available over the Internet and can be 
accessed using different technologies such as WWW, 
Gopher, ftp, telnet, e-mail or listserv. Several traditional 
journals are now being published both on the Web and in 
print. Current issues or content lists for most of the journals 
are available on the Web or distributed to subscribers as e- 


312 Manual of Digital Libraries 
mail text messages. 


Internet-based electronic journals started to appear in 
the beginning of 1990. These journals were mostly delivered 
as an attachment to e-mail while their back issues were 
mounted on anonymous ftp sites and users were required to 
download them from these ftp sites. The libraries and 
information centres made them accessible through their 
gopher site. 1995 witnessed a peaking of Gopher technology 
which then dropped suddenly and dramatically by 1997. With 
the advent of WWW technology in 1993, electronic publishing 
became more than a novelty. The web as a means of delivery 
of electronic information has grown steadily since then. As 
publishers experiment with different publication modes and 
models, the very definition of a journal is undergoing change 
in the electronic environment. New journals have evolved 
based on the graphic capabilities of the Internet that are 
available only in electronic form. 


The growth of online e-journals is growing exponentially. 
Ulrich’s International Periodical Directory (1999) reports that 
of a total of 1,57,000 serials listed in the Directory, 10,332 
are available exclusively online or in addition to then print 
counterparts. 


Patents : Patents are the specifications concerning with 

the design or manufacture of products and processes that 
are protected and secured for the exclusive profit of the 
designer or inventor for a limited number of years that varies 
in different countries from fifteen years to twenty years. The 
Patent Office, the department that controls the registration of 
patents in a country, provides full-text of patents registered 
in their respective countries through their web sites. All patents 
registered in US are available free of cost through the United 
States Patent and Trade Mark Office (http: /www.uspto.gov/ 
matn/patents.htm). There are several meta resources, some 
of them dealing exclusively in patents, for example, Krislyn’s 


Documents and Resources in Digital Libraries 313 


Favorite Patents, Trademark & Copyright Sites (http:/ 
www. krislyn.con/sites/patent.htm). Moreover, a free e-mail 
service called “Patent Alert Service” (Attp:/ 
www.patentalert.com/), is also available for the people 
interested in patents. Once subscribed, the subscribed users 
would receive periodical updates about inventions recently 
patented in the United States. The alert service, sent through 
e-mail, would include descriptions of patents in the fields 
defined by the user. The updates are mailed daily, weekly, 
bi-weekly or monthly, depending upon the user’s choice. A 
user can customize his or her subscription or alternatively, 
he or she can specify a wider range of themes for a broader 
view of affairs. Important patent-related sites are listed below: 
— Canadian Patent Database 
http:/patents/.ic.gc.ca/intro-e. html 
— Derwent Patents 
http://www. derwent.com/ 


— IBM Intellectual Property Network 
http://www. patents. ibm.com/ 

—  Krislyn’s Patents, Trademark & Copyright Sites 
http.//www.krislyn.com/sites/patent.htm 

— NIC Patent Cell 
http://pk2id. delhi.nic.in/sera.html 

— Patent Alert Service 
http://www. patentalert.com/ 

— Patent Cooperation Treaty Electronic Gazette 
http: Hoctgazette. wipo. int/ 

— US Patents & Trademark Office 
http:/www.uspto.gov/main/patents.htm 

— World Intellectual Property Organization 
http://www.wipo.org/ 


314 Manual of Digital Libraries 


Electronic Preprints and E-prints : Electronic preprints 
are the research articles that are made available for 
distribution through the network in electronic format before 
they go through the process of peer reviewing. Ginsparg 
preprint archive (Attp./www.arXiv.org/), started in 1991, has 
become a fundamental means of communication for a 
growing number of fields, starting with theoretical high-energy 
physics, later spreading to other areas of physics, and now 
also to computer science and mathematics. Ginsparg’s 
preprint archive is a sterling example of how technology can 
lead to a sudden, profound, and beneficial transformation. 
This archive processes 25,000 submissions, which is 
substantial, but small compared to around two million papers 
in all science, technology, and medicine areas. It receives 
two-thirds of its two million weekly hits from institutions outside 
the United States, including many research facilities in 
developing regions. The archive has become indispensable 
to researchers worldwide, but in particular to research 
institutions that would otherwise be excluded from the front 
line of science for economic and sociological reasons. The 
success and wide adoption of arXiv has prompted new 
thinking about the reform of scientific publishing in other 
disciplines. Scientists have become aware of the many 
benefits conferred by open archiving, such as the removal of 
the cost barrier to high-priced journals, the reduction of time 
in announcing research findings, and the provision of access 
to all with Internet capability. As a result, other e-servers have 
been set up and the movement to free scientific publishing 
from financial restrictions has been growing steadily. A few 
examples of preprint servers in other disciplines are: 


—  Ginsparg Preprint Archive (ArXiv.org) 
(Attp:/www.arXiv.org/) 


— UK e-Print archive) mirror 


(Attp://xxx.soton.ac. Uk/) 


Documents and Resources in Digital Libraries 315 


Open Archives Initiative 

(http://www. openarchives.org/) 

PubMed Central 

(http://www. pumedcentral.nih.gov/) 

American Mathematical Society Preprint Server 
(http: /www.ams.org/preprints/, 


CERN Preprint Server 
(http./preprints.cem.ch/, 


Chemical Physics Preprint Database 
(Attp./www.chem.brown.edu/chem-ph. html, 


Chemistry Preprint Server 
(http://www. chemweb.corm/preprint 


Economics Working Paper Archive 
(Attp:/econwpa. wustl.edu/wpawelcome. html, 


SISSA Preprint Server 
(http./babbage.sissa./t/) 

High Energy Physics Preprint Database 
(http.//wwwspires.slac.stanford.edu/FIND/hep) 


Nitride Semiconductor Research Preprint Server 
(Attp://nsr.mij.mrs.org/preprints/, 


Clinical Medicine and Health 
(Attp:/clinmed. netprints. org/, 


Department of Energy’s PrePRINT Network 
(http://www. osti.gov/preprint/) 
Theoretical Chemistry Preprints 


(http://www. chemie.uniregensburg. de/pub/preprint/ 
GENINFO. html) 


‘E-prints’, is the term generally used to describe 


electronically mounted copies of the final, peer-reviewed 
versions of journal articles. Among the best known proponents 


316 Manual of Digital Libraries 


of e-print developments is Dr. Stevan Harnad of the University 
of Southampton. Dr. Harnad advises to authors to self-archive 
their published papers (postprints) which, if adopted widely, 
would lead to the ultimate removal of cost barriers for the 
exchange of publicly funded research information. These 
developments have generated much debate and a number 
of international initiatives have evolved to refine and 
standardize the archiving procedures. One important 
international movement is the Open Archives Initiative (OAI), 
which aims to develop and promote the use of a standard 
protocol, known as the Open Archives Metadata Harvesting 
Protocol (OAMHP), designed for better sharing and retrieval 
of e-prints residing on distributed archives. There are various 
forms of open archiving. The term ‘self archiving’ is often used 
to refer to the process whereby individual authors submit their 
own papers to a server or archive of their choice. There are 
‘institutional archives’, where authors submit e-prints to a 
server administered by an organization or scholarly society, 
commonly their university or research institute. There are also 
discipline-based archives and other specialty archives. An 
important example of a specialty archive is the Electronic 
Research Archive in International Health (ERA), set up by 
the long-established international medical journal, 7he Lancet. 
This archive allows medical researchers to deposit papers of 
special relevance to health issues encountered in many 
developing countries. Papers submitted are reviewed before 
acceptance and are thereafter archived and available online 
free to all. The University of Southampton has produced some 
free software to simplify the process of uploading and 
downloading PDF files. The California Digital Library is one 
institution that has installed the software. 


Projects (Ongoing and Completed) : There are several 
agencies that award time-bound research undertakings to 
individuals, group of individuals and institutions with well- 
defined goals and or tangible products or services. Information 


Documents and Resources in Digital Libraries 317 


on projects that are ongoing or those that are completed, is 
now easily available through directories and compilations 
including: (i) compilations by sponsoring agencies; (ii) 
compilations by the institutions that get the projects from 
various sponsoring agencies; and (iii) other compilations and 
directories. Such compilations provide a list of research 
projects currently underway in specified fields with a brief 
description of the projects including details of the investigators 
and place of investigation. A few examples of such 
compilations are given below: 


— Boston University: Research Projects Directory 
(http.//scv.bu.edu/PROJECTS/ 


— Knowledge Discovery in Databases: Projects 
(Attp:/orgwis.gmd. de/explora/pages. htm, 


— Science experiments in Physics, Mathematics, 
Nonlinear Sciences, and Computer Science 
(Attp://xxx lanl. gov/, 


— Signal Processing Information Base (SPIB) 
(http.//spib.rice.edu:80/spib.html, 


— Social Science Research Resources 


(http://socsci.colorado.edu/POLSCI/RES/ 
research. html 


Science/Research News : Science and research news 
are important sources of information for scientists and 
technologists. Science and research news are good sources 
of information for most recent developments. Several core 
disciplines have periodicals devoted exclusively to publish 
science, research and technical news for a given discipline. 
Some of the important resources on science and research 
news include: 


—  UniSci; International Science News 


(Attp:/unisci.com/) 


318 


Manual of Digital Libraries 


Earth Research: Research News 
(http://www. earthresearch. com/inks.shtm) 


Combigenix News 
(http./www.combigenix.com/news/, 


NewsCenter: Up to the Minute News 
(http:/qwis2. circ.gwu.edu/~gprice/tech.htm 


MagazineCity.net: Science News 
(http://store. yahoo.com/magazinecitv/3326-52. html 


The Scientific World Newslab 
(http://www. thescientificworld. com/) 
Scoop! Personalised News Service 


(http://www.scoopdirect.com) 


The Ultimate News Links Page 
(Aito: .net/links/news/, 


Software : There are a large number of free software 


and scripts of all kinds and types available on the Internet. 
People have freedom to run, copy, distribute, study, change 
and improve the software under General Public License 
(GPL). Some of the sites that pro.-de free software are as 
follows: 


t 


Downloads.com 
(Attp.//download.cnet.com/, 


GNU Free Software Directory 
(http://www. gnu. org/directory/isting. html 


Freeware Home 
(http.//www.freewarehome.com/) 
Jambo: Free and Shareware 


(Attp./www.jumbo.com/) 


Freeware Palm 
(http://www. freewarepalm.com/, 


Documents and Resources in Digital Libraries 319 


— Zdnet 
(Attp./www.zdnet.com/zdi/software) 


— Shareware.com 
(http://www. shareware. com, 


— Chemistry Software Exchange 


(Attp.//nhse.npac.svr.edu:8015/rib/repositories/Csit/ 
catalog/index.html) 


— Engineering Software on the Internet 
(Attp:/www.engcen.com/software.htm) 


Standards : Standards are agreed targets for the 
performance, or an accepted format for the operation of a 
system. Technical standards specify how materials and 
products should be manufactured, defined, measured or 
tested according to proven and accepted methods. Standards 
may be issued by companies, or by other organizations both 
national and international. The Bureau of Indian Standards 
(BIS), an independent national body funded by the 
Government of India, is the largest originator of standards in 
India. Indian Standards are available in digital format on CD- 
ROMs but not on the Internet as yet. Some of the standards 
are accessible free, while others require subscription on pay- 
per-transaction basis. Standards are issued by various 
international organizations such as ANSI, ISO, IEEE, NIST. 


Standards are very important both in the library and 
computer fields. MARC and its variant are bibliographic 
standards that are used most extensively in the libraries for 
cataloguing of bibliographic records. Similarly, AACR-II is a 
standard for rendering, display and printing of bibliographic 
records. Universal Decimal Classification Scheme (UDC) is 
a British Standard (BS-1000). Some of the important websites 
providing inforrnation on standards are as follows: 


— British Standards Institution 
(www. bsi-global.com/, 


320 Manual of Digital Libraries 


— Bureau of Indian Standards 
(http://www. bis. org. in/) 


— ASTM International 
(http://www. astm. org) 


— American National Standards Institute (ANSI) 
(Attp://www.ansi.ord/, 


— Deutsches Institute fur Normung (DIN) 
(http://www. din. de/, 


— IEEE Standards 
(Attp://ieeexplore. org/ipdocs/epic03/standards.htm, 


— El Web Standards 
(http://www. el. org/; 


— World Standards Services Network 
(http://www. wssn.ne/WSSN/index. html, 


Technical Reports : A technical report is a scientific 
paper or an article that provides a detailed account of work 
done on a particular project. Technical reports are generally 
prepared by the research workers themselves for submission 
to their employer, funding agency or to others interested in 
the work. The report literature does not get published in 
journals or conference proceedings. The report literature is 
issued by research and development agencies like NASA, 
NTIS, INIS, GE, RAND, etc. Some of the important Internet- 
based sources of information for technical reports are: 


— DOE's Scientific and Technical Literature 
(http://www. osti.gov/bridge/) 

— National Technical Information Service 
(Attp:/www.ntis.gov/, 


— NASA Technical Reports Server 
(http./techreports.larc.nasa.gov/cg!-bin/N TRS) 


— Networked Computer Science Technical Reference 


Documents and Resources in Digital Libraries 321 
Library 


(http. //www.ncstri.org/) 


— SCS Technical Report Collection 
(Attp.//reports-archive.adm.cs.cjml.edu/, 


— Marshal Technical Report Server 
(hitp./mstrs.Msrs.nasa.gov/, 


Electronic Theses and Dissertations : Theses 
submitted to the universities as requirement for the award of 
Ph.D. degree constitute a useful source of information for 
the new and ongoing research. A thesis contains records of 
an original contribution to knowledge. Although a large 
number of doctoral theses are submitted to every university 
each year, they are not being used to their fullest potential 
because most libraries keep them in closed-access 
collections. Doctoral dissertations submitted to universities 
and academic institutions are originally created in digital 
format using word processing software packages like 
MSWord, LaTex, Word Perfect, Word Pro, etc. or one of the 
desktop publishing packages like Page Maker, Ventura, etc. 
These documents are undisputedly highly valuable collections 
especially in digital format that qualify to be an important 
component of a digital library. The documents composed on 
word processing packages / desktop publishing packages can 
be easily converted into PDF, Post Script or XML using 
appropriate software tools so as to host them on the web. 
Several universities and institutions have already 
implemented electronic submission of doctoral dissertations 
under the overall umbrella of an international digital library 
initiative called “Networked Digital Library of Theses and 
Dissertations”(NDLTD). IIT Bombay and University of Mysore 
are already members of NDLTD initiatives, which have 125 
members from all over the world. 


The Networked Digital Library of Theses and 


322 Manual of Digital Libraries 


Dissertations (NDLTD) is a project initiated at the Virginia 
Tech University and funded by the SURA and SOLINET. It 
promotes electronic submission of doctoral dissertations and 
make them accessible to scholars the world over. The Virginia 
Tech has developed tools for students for submission of their 
electronic dissertations both as SGML and PDF. Tools have 
also been developed to coordinate development and 
implementation of a distributed digital library system so that 
ETDs from all participating institutions can be easily accessed. 
These search tools available at “Atto:/www.theses.org’ allow 
browsing and searching options from all distributed sites. 
These software tools are made available to institutions free- 
of-cost who formally join the NDLTD project. Some of the 
important sites for electronic theses and dissertations are: 


— Networked Digital Library of Theses and Dissertations 
(http://www. theses. org/) 

— Academic Dissertation Publishers 
(http://www. dissertation.com/) 

— Theses and Dissertations 
(Attp:/www.umi.com/) 

— UMI Digital Dissertations 
(http:/wwwilib.umi.com/dissertations/main/) 

— VTETDs from the Scholarly Communications Project 


(Attp://scholar. lib. vt.edu/theses/) 


Links to all 125 member libraries are available from Attp: 
Awww.nditd.org/ members/ for further information on 
status of individual member institutions. 


4.2.2. Databases, Data Sets and Collections 


Databases, datasets and collection are discussed below: 


Abstracting and Indexing Databases : Databases are 
a collection of records pertaining to a specific field of study. 


Documents and Resources in Digital Libraries 323 


An increasing number of bibliographic databases with 
abstracts of chapters in books, journal articles and conference 
proceedings are now available on various media. Availability 
of CD-ROM, and more recently, DVD-ROM, as a media with 
high-storage capacity, longevity and ease of transportation, 
triggered production of several CD ROM-based information 
products including several bibliographic databases which 
were earlier available only through online vendors or as 
abstracting and indexing services in printed format. 
Thousands of CD-ROM databases are currently available 
from a multitude of CD-ROM producers including Silver Platter 
that alone produces more than 250 CD-ROM information 
products. Several full-text databases also started appearing 
in the late 1980s and early 1990s launching the beginning of 
a new digital era. Most of the online databases that were 
earlier available on CD-ROM or through the online vendors 
are also now available on the web with added functionality 
and features. Some of the important online databases 
accessible on the Internet include: 


— AGRICOLA 
(http://www. nal.usda.gov/ag98/, 


— Beilstein Abstracts on ChemWeb 
(http://www.chemweb.com/databases/beillstein/, 

— ERIC Databases 
(http./ericir.syr.edu/Eric/) 

— GPO Online: US Government Printing Office Database 
(http://www. access.gpo.gov/su_docs/db2. html, 


— PubMed Medline 
(http://www. ncbi.nim.nih.gov/PubMed/, 


— Recent Advances in Manufacturing 
(http:/www.eevl.ac. uk/ram/index. html, 
SciBASE 
(http://www. thescientificworld.com/scibase/) 


324 Manual of Digital Libraries 


— Energy Files 
(http://www. doe. gov/EnergyFiles/, 


— PubScience 
(Attp.//pubsci.osti.gov/, 


Online hosts like Dialogweb, STNEasy, Engineering 
Village, etc. are also making their databases web accessible 
on subscription. 


Citation Databases : A citation is a reference to an 
article or part of article identifying the document in which it 
may be found. References given at the end of an article are 
called “cited articles” while the article that provides references 
are called “citing article”. A citation index consists of list of a 
cited articles, each one of them followed by the citing articles. 


ISI Citation Databases are multidisciplinary databases 
of bibliographic information gathered from thousands of 
scholarly journals. It is indexed so that one can search for 
specific articles by subject, author, journal and / or author 
address. Each article in the ISI databases includes the article’s 
cited reference list, often called its bibliography. It is, therefore, 
possible to search the databases for articles that cite a known 
author or work. The important citation indices produced by 
the Institute for Scientific Information (http://www.isinet.com/), 
are as follows: 


— Science Citation Index Expanded 
— Social Sciences Citation Index 

— Arts and Humanities Citation Index 
— ChemSciences Citation Index 

— BioSciences Citation Index 
— Clinical Medicine Citation Index 


Citation products and services from the Institute of 
Scientific Information are available through a new product 


Documents and Resources in Digital Libraries 325 


called “Web of Science” (http:// www.webofscience.com/). 


The Web of Science is designed to perform searches on ISI 
citation databases and navigate easily through the results. 
The list of articles, retrieved from the databases available in 
the Web of Science are linked to their full-text on the 
publisher’s site. Moreover, from a full record, one can view 
lists of the article’s references, articles that cite the article, or 
articles that are related to (share references or co-citations) 
the article. 


Digital Collections : The Internet and web technology 
is a suitable substrate for multimedia websites including 
information in the form of text, images, sounds and movies. 
The web hosts a rich collection of sounds and images, many 
of which can be used for commercial as well as personal 
purposes. Some of the multimedia information products on 
the web require special plug-ins like Flash, Macro Media 
Players, etc. A few examples of multimedia digital collection 
on the web are: 


— NASA's Multimedia Gallery 
(http:/www.nasa.gov/hqpaosibrary.htmi, 


— The Great Buildings Collection 
(http://www. greatbuildings.com/, 


— The Nine Planets 

(http://seds. Ipl.arizona.edu/nineplanets/nineplanets/ 

nineplanets.html) 

It is now much easier to track down an image with an 
impressive arsenal of specialized image search engines 
available on the Internet. Some of the important image search 
engines on the web include: 


— AltaVista Image Search 

(http://www. altavista. coin/sites/search/simage) 
— Ditto 

(http:/www.ditto.com/, 


326 Manual of Digital Libraries 


— Excite 
(http://www. excite. com/, 
— FAST Multimedia Search 
(http:/multimedia.alltheweb.com/) 
— Google Image Search 
(http.//images. google. com/advanced_jmage_search) 
— HotBot 
(Attp./hotbot.lycos.com/, 


— İthaki Image and Photo Metasearch 
(http://www. todalanet.com/images/, 


— IXQUICK 
(http://www. ixquick. com/, 


— Lycos Multimedia Search 
(Attp:/multimedia.lycos.com/, 


— Picsearch 
(http://www. picsearch.com/; 


— Scour 
(http://www. scour.com/) 


— Yahoo! Picture Gallery 
(Attp:/gallery. yahoo. com/, 


— Big Search Engine Index to Images 


(http://www. search-engine-index.co. uk/images- 
Search) 


Equipment / Product Catalogues : A web-based 
catalogue is a listing of products along with complete 
specifications about the product. Equipment / product 
catalogues are generally searchable. Catalogues are 
especially helpful for corporates in identifying the recent 
products available in the market in order to purchase them. 
The reviews of the product from users are also included on 
the site. Important examples of product catalogues are: 


Documents and Resources in Digital Libraries 327 


Camie-Campbell Product Catalogue 
(http.//www.camie.com/prod brochures.htm) 

Sony Electronic Products 
(Attp./www.sonystyle.com/home/home./sp) 

Minolta Europe 

(http://www. minoltaeurope.com/products/ 
products. html) 


DesignInfo.com 
(http://www. Designinfo.com/, 


System Optimisation Information 
(http://www. sysopt.com/, 
Marketplaces Providing Product Catalog Services 


(http.//www.sourceguides. com/markets/bvS/cat/ 
Catalog.shtml) 


Scientific Data Sets (Numeric, Property, Structural 


Databases) : Scientific data sets (numeric, property, structural 
databases) are databases that contain factual data like 
numeric, property and structural information on the topic 
indexed. The data collections are critically assessed by 
individual experts or group of experts, hence are an authentic 
source of information for researchers. Important examples 
of scientific data sets are: 


Aladdin Database Server 
(http: /www-amadis.laea. O1g/, 


Data Analysis in the Social Sciences 
(http://uts.cc.utexas.edu/“fackler/data.htm|, 


GrainGenes 
(http./wheat.pw.usda.gov/, 


LIGAND 
(Attp://www.genome.ad.jp/dbget/igand. html, 


The Nuclear Explosions Database 


328 Manual of Digital Libraries 


(http./www.agso.gov.au/intormation/structure/ 
isd/database/nukexp_query.htm/) 


Library Catalogues : Librarians, as the earliest 
inhabitants of the Internet and the web started putting their 
contents on the web. Not only did the libraries build meta 
resources for their home pages, they also web-enabled their 
library catalogues. Most standard library software packages 
have web interfaces to their catalogues. Several integrated 
library packages are now moving towards doing all operations 
using Internet clients. The sites given below also provides 
links to the Library's WebPAC: 


— Libweb - Library WWW Servers 
(http://sunsite. berkeley.edu/Libweb/, 


— The LibDex 
(Attp.//www.libdex. con, 
— The British Library 
(http://www. bl. uk/) 
— National Library Catalogues Worldwide 
(http www librarv.ug.edu.au/ssah/jeast/) 
— Library of Congress (LOC) 
(Attp:/1cweb.loc.gov) 
— Library of Congress WWW/Z39.50 Gateway 
(Attp-/cweb. loc.gov/z3950/) 
— Library of Congress Catalogue 
(Attp://catalog.loc.gov/) 
— Supersearch 
(Attp:/www.nbcls. org/) 
—  Melvyl Homepage 
(http: /www.melvyl.ucop.edu/) 
— WebCATS 


(http. www lights. com/webcats/) 


Documents and Resources in Digital Libraries 329 


Museum and Archives : The virtual museum websites 
facilitate virtual visits of users to a museum and examine the 
exhibits closely from their desktop. Using various tools and 
techniques, the user is also able to rotate an object in any 
direction. Art auction sites are also using similar techniques 
to promote auction of their art works. Some of the virtual 
museum and auction sites are: 


— Virtual Library Museums Pages 
(http://www. icom.org/vimp/, 


— Smithsonian Institution 
(Attp.//www. si.edu/, 


— World Wide Arts Resources 
(Attp:/wwar. con, 


— Art Museum Network 
(Attp:/www.amn. org/, 


— The Virtual Museum of Computing 


(http://www.comlab.ox.ac.uk/archive/other/museums/ 
coinputing. htmi#museums) 


Virtual Libraries : The term “Virtual library” or “library 
without wall” usually refers to the meta resources or subject 
portals that extend virtual accessibility of digital collections 
from several diverse sources without the users even knowing 
where the resource actually resides. A virtual library could 
potentially be enormous, linking huge collections from all 
around the globe, or it could be very small, consisting of a 
few hundred links to digital resources maintained by an 
individual. 


4.2.3. E-Books on the Web 


Project Gutenberg started digitizing public-domain texts 
for download in 1992. The project has a team of volunteers 
re-keying texts. It offers more than 3,000 public domain titles 
free. New kinds of businesses are now emerging on a new 


330 Manual of Digital Libraries 


scale involving a large number of publishers to make 
thousands of book available online for libraries and individuals 
at relatively lower cost. Three major companies that have 
recently emerged in this market are Questia, ebrary and 
NetLibrary. All three platform offer e-books, journal articles 
and encyclopaedia articles besides other services as value 
additions. 


Questia: the Online Library (http://www.questia.com): 
Questia plans to deliver 2,50,000 electronic books in the field 
of humanities and social sciences to the users directly at a 
reasonable subscriptions for access. More than 235 
publishers have signed-up to provide contents to the Questia. 
Questia employ experienced librarians on their team for 
collection development. Collections at the Questia can be 
searched free of cost. 


Ebrary (http://www.ebrary.com/) : Ebrary markets its 
services directly to libraries in order to “augment the library 


services and provide its patrons with access to complete text 
contained within published, authoritative content”. E-brary has 
negotiated arrangements with more than 100 publishers. 
Searching and browsing in e-brary is free, but users pay for 
downloading and printing. Unlike other e-books publishers, 
e-brary obtain the files direct from publishers in PDF rather 
than digitizing them. 


NetLibrary (http://www.netlibrary.com/) : NetLibrary 


targets academic, public and corporate libraries and has 
published collection development policies for these main 
areas. The NetLibrary has more than 40,000 titles and the 
MARC records for its contents are available on OCLC. Access 
to titles within NetLibrary mimic traditional library circulation 
model, i.e., only one user at a time can view a copyrighted 
text within the collection. 


Online Bookselling : Amazon.com started a new 
phenomenon on the web with its online bookshop, which has 


Documents and Resources in Digital Libraries 331 


been expanded to include other products like CDs, Music, 
electronics, toys, art works, computers, and other store items. 
Amazon.com was termed as the “Earth’s Biggest Library” 
(http://www. infotoday.com/newsbreaks/nb 1122-1.htm) 
although it does not perform all the functions of a library. There 
are several sites that are now in the business of online book 
selling. Some of them are mentioned below: 


Abebooks.com 
(Attp./www.abebooks. corny, 


Amazon.com Bookstore 

(Attp:/www.amazon.cony/, 

Barnes & Noble 

(http://shop. barnesandnoble.com/booksearch/ 
isbninquiry.asp ?) 


Best Book Buys 
(Attp.//www.bestbookbuys. com/, 


Book Finder 
(http://www. booktinder.com/, 


Catalog Site 
(http./www.catalogsite.com/, 


Pricescan Before You Buy 
(http://www. pricescan.com) 


Studenstbookworld.com 
(http :/www. students world.com/, 


Swotbooks.com 
(Attp://swotbooks.com/, 


Varsitybooks.com 
(http://www. varsitybooks.com/) 


Print-on-Demand : Print-on-Demand books are digitally 


printed from electronic files by high-quality laser printers, and 
then bound and cut. It is a process of replacing traditional 


332 Manual of Digital Libraries 


paper media with digital print files. Printing becomes a 
demand process where the end-user determines the 
requirement for printed copies. Substitution of the digital file 
for paper media does not change the publishing process, but 
eliminates the requirement to distribute and stock in printed 
media. File servers are used as the publications stockroom 
and networks are used to distribute documents. The print- 
on-demand method is quite new and is a cost-effective and 
efficient way to print one copy at a time. Print-on-demand 
services use new photocopying technology combined with 
streamlined binding methods and economical full-colour 
digital printing to give 100,250, or 500 books that look as good 
as if they were produced with traditional printing and binding 
equipment. 


The process by which documents are printed in high 
volume has not changed much over the last 50 years until 
now. With Print-on-Demand solutions, publishers and other 
outlets can now print what they want, where they want it and 
when they want it. One of the amazing benefits of Print-on- 
Demand (PoD) technology is the ability to create a single 
document from a variety of different file formats. Print-on- 
Demand solutions allow the uses to create a document 
containing pages from virtually any number of applications. 
The shelf-life of information is getting shorter and shorter all 
the time. Because on-demand printing requires little or no 
set up time, document production can begin as soon as 
document creation is complete. Since on-demand printers, 
can be networked, documents can be printed at locations 
world-wide as they are needed. The technology including 
hardware and software is compact, relatively inexpensive, 
multifunctional and networkable. Soon, one will be able to 
walk into a bookstore and get any book printed in that same 

time frame. The phrase “out-of-print” could soon be out of 
vogue. Barnes & Noble and Barnesandnoble.com are 
planning to use the latest technology to print books to order. 


é 


Documents and Resources in Digital Libraries 333 


This new twist to publishing will cut costs and better manage 
inventory. Previously more than one million books have been 
out-of-print with 90,000 titles disappearing each year. Many 
publishers were forced to turn down quality books with 
valuable editorial contents because there was no market to 
justify the costs. This custom-printing service is the long 
awaited solution for persons needing a book title that is out- 
of-print because of a small press run. 


Barnes & Noble, leading publishers, has bought an 
enormous amount of contents previously unavailable to 
readers. These contents will now be able to reach the 
marketplace. More recently, NetLibrary has announced its 
entrance into the Short Run and Print-on-Demand 
marketplace. IBM, Xerox, Lightening and Sprout are some 
other active players in this field. 


4.2.4. Reference Sources 


The web hosts an extraordinarily rich and varied variety 
of reference books that have been ‘published’ on the web for 
some years. The commercial publishers, recognizing the 
potential of web delivery, have converted their most important 
reference works into web-based reference services, backed 
by professional promotion and customer support. There have 
already been some notable achievements: 7he Oxford 
English Dictionary, the Grove Dictionary of Art, and the large 
reference works published by the Gale Group are pioneers 
in this gradual mobilization of reference resources to the World 
Wide Web. 

Xrefer (Attp./www.xrefer.com/) specializes in reference 
works on the web. The founders of Xrefer took it as their 
mission to aggregate and integrate reference works. 
Aggregation of reference works involves bringing diverse 
works together into a common website and then providing 
users with a search engine which executes searches on the 
complete aggregated library of reference content. While 


334 Manual of Digital Libraries 


aggregation of reference works would lead to efficient 
distribution and ‘power searching’ across a range of titles, 
integration would ensure that the whole collection of reference 
material would contain index terms and annotations linked to 
related entries found in disparate sources. A compelling 
integration strategy would lead to improvements in browsing 
and navigation. Various reference source are discussed 
below. 


Dictionaries : Thousands of general-purpose and 
subject-specific dictionaries are now available on the web. A 
few important dictionaries available on the Internet are 
mentioned below: 


— Academic Press Dictionary of S&T 
(Attp:/  harcourt.com/adictionary/, 


—  DictSearch: Search in Online Dictionaries 
(http://www. foreignword.com/Tools/dictsrch.him) 
— Dictionary of Phrase and Fable 
(http://www. bartleby.com/8 1/) 
— Important Online Dictionaries 
(http://www. yourdictionary.com/) 
— Cambridge Dictionary Online 
(http.//disctionary.cambridge. org/) 
— Merriam-Webster Online 


(http: /www.meriam-webster.com/) 


Electronic Encyclopaedia : The availability of enormous 
storage space on the CD-ROM coupled with sophisticated 
search software witnessed the appearance of several 
encyclopaedias on CD-ROM. Later, web versions of these 
encyclopaedias became available as important reference 
tools on the web. Encyclopaedia Britannica can be treated 
as an example of a formerly flourishing business that fell into 
trouble in just a few years by neglecting electronic media. 


Documents and Resources in Digital Libraries 335 


Encyclopaedia Britannica has since collapsed, and was sold 
to Jacob Safra, who is investing additional funds to cover 
losses and revamp the business. The expensive sales force 
has been dismissed, and while print versions can still be 
purchased from bookstores, the focus is on electronic 
products. This collapse occurred even though Encyclopaedia 
Britannica had more than two centuries of tradition behind it, 
and was by far the most scholarly and best known of the 
English-language encyclopaedia. 


While Encyclopaedia Britannica was still sold at US 
$1,500.00 - $2,500.00, the market was flooded with $50 CD- 
ROM encyclopedias. Although they did not have the same 
quality of contents, nor the nicely printed volumes, but they 
did have superior searchability, portability, and an irresistible 
price. It is important to note that after some abortive attempts 
to sell first $1,200, then $300 CD-ROMs, Encyclopaedia 
Britannicais now offering its CD-ROMs for $125 or even less. 
Web versions of several important encyclopaedias are 
available over the Internet. A few examples are given below: 


— Encyclopaedia Britannica 
(http://www. britannica.com/, 

— Kirk Othmer Encyclopedia of Chemical Technology 
(http://iws-edck. interscience. wilev.com:8093/ 
index.html) 


— Nupedia. com 
(http.//www.nupedia.com/, 


— Columbia Encyclopedia 
(http./www.bartleby.com/, 


— Encarta Encyclopedia 
(hAttp./encarta.msn.com/, 


— Important Encyclopedia 


(http://www. encyberpeaia. com/cyberlinks/inks/ 
index.html) 


336 Manual of Digital Libraries 


Biographies : Biographical sources provide information 
about people considered important in various disciplines. 
Internet serves as an excellent source of information for 
biographical information whether the information is available 
in a biographical source or through websites of individuals / 
organizations. There are several biographical sources 
available on the Internet. Some of the important ones are 
mentioned below: 


—  Biography.com 
(Attp./www.biography.com/, 


— Genealogy.com 
(Attp.://www.genealogy.corm/, 


— Lives, the Biography Resource 
(Attp./amillioniives.com/, 


— World Biographical Index 
(http://www. biblio. tu-bs.de/wbi_en/) 
—  Xrefer 
(Attp.//xrefer.com/) 
— Biographical Dictionary 
(Attp.//www.s9.conVbiography/search. html) 
— Famous Physicists 


(Attp://cnr2. kent.edu/~manlev/physicists. html) 


Acronyms and Abbreviations : Acronyms and 
abbreviations are used extensively in day-to-day 
communication. Besides, they are also used as part of 
vocabulary in subject-specific disciplines. Information 
technology has several acronyms and abbreviations that are 
used on a day-to-day basis. Internet hosts several good 
resources for finding acronyms and abbreviations. A few of 
them are listed below: 


— Acronyms and abbreviations 
(http://www. ucc.ie/info/nevacronyms/index.htm|, 


Documents and Resources in Digital Libraries 337 


— Alphabet Soup Explained 
(http.//members.aol.com/nigthomas/alphabet.html) 


— BABEL 
(Attp:/www.cis. columbia.edu/glossary.hitmml, 


— AF: Acronym Finder 
(Attp:/www.acronymfinder.con/, 


— StarBits Acronyms, Abbreviations, and so on 
(http:/cdsweb. u-strashg. ft/“heck/stbits.htm) 


— Abbreviations and Acronyms of the U.S. Government 
(http://www. ulib. iupul. edu/subjectareas/gov/docs- 


abbrev.html) 


Thesauri and Subject Headings : A thesaurus may be 
defined either in terms of its function or its structure. In terms 
of function, it is a terminological control device used for 
translating from the natural language of documents into 
controlled vocabulary. In terms of structure, a thesaurus is a 
controlled and dynamic vocabulary of semantically and 
generically related terms which covers a specific domain of 
knowledge. A number of thesauri of the most commonly used 
terms in various fields have been published in order to achieve 
a unity of indexing terminology in their respective field. Subject 
headings are the words or a group of words under which books 
and other material on a subject are entered in a catalogue in 
which the entries are arranged in alphabetical order. List of 
subject headings are used by the cataloguers to achieve 
uniformity. Typical examples of standard subject headings 
used in libraries are: Library of Congress Subject Headings 
(LCSH), Medical Subject Headings (MeSH), Subject 
Headings in Engineering (SHE) and Sears List of Subject 
Headings (SLSH). Some of the thesauri and subject headings 
available on the Internet are: 


— Roget’s Thesaurus 
(http://www. thesaurus.com/Roget Alpha-Index.html) 


338 


Manual of Digital Libraries 


M-W Thesaurus 
(http./www.m-w.com/mw/thesaurus. htm) 


Medical Subject Headings 
(http://www.nim.nih.gov/mesh/meshhome. html, 


Roget’s Thesaurus Online 
(http://www. bartleby.com/62Z/, 


Handbooks and Manuals : Handbooks are the treatises 


on a special subject containing concise information written 
primarily for practitioners. A number of handbooks are 
available on the web in various subject speciality. Some of 
them are: 


Country Studies / Area Handbooks 
(http:Mcweb2. loc.gov/frd/cs/cshome.html, 


Automotive Learning On-line 
(Attp:/www. innerauto. com/innerauto/htr/auto. html 


Earthquake Preparedness Handbook Foodborne 
Pathogenic 


(http://www. lafd.org/eqgindex.htm) 
Microorganisms and Natural Toxins 
(Attp.//vm.cfsan.tda.gov/_mow/intro. html, 
Handbook for Digital Projects 
(Attp./www.nedcc. org/digital/dighome.htm) 


Handbook of Forensic Services 


(http//www. fbi. goW/hqhab/handbook/intro.htm) 


Merck Manual of Diagnosis and Therapy 
(Attp:/www.merck. com/pubs/mmanual/, 


Maps : Maps constitute a special collection in a library 


consisting of documents that make plane representation of 
the earth’s surface or its part indicating its physical features, 
political boundaries, etc. Internet contains a large number of 
sites that provide maps and other geographical information. 


Documents and Resources in Digital Libraries 339 


With availability of tools and techniques offered by the 
Geographical Information System (GIS) and associated geo- 
coded data, there are several sites that provide computer- 
based geo-sensitive information. Some of the important sites 
that provide maps and GlS-based information services 
include: 


DEMIS World Map Server 

(Attp.//www2. demis.ni/mapserver/Mapper.asp) 

HRW World Atlas 
(Attp:/go.hrw.com/atlas/norm_htm/world.htm) 

Quick Map of the World 
(http://www.theodora.com/maps/ 
abc _world maps.html) 


Map.com 
(Atto:/www.maps.com/explore/atlas/, 


Worldtime 
(http://www. worldtime.com/) 


Mapmachine: National Geographic 
(http.//plasma.nationalgeographic.com/mapmachine/ 


Mapnet Visual Search Engine 


(http.//maps.map.nevstart) 


Geosource 
(http://www. library.uu.ni/geosource/, 


USGS NSDI Clearinghouse 
(Attp://nsdi.usgs.gov/products/gnis. html 


USGS Mapping Information Geographic Names 
Information System (GNIS) 
(Attp:/mapping. usgs.gov/www/gnis/, 


4.2.5. Organizations and People 


Internet hosts a plethora of information about people 


340 Manual of Digital Libraries 


and organizations through the websites that these 
organizations or people host on the web or through various 
websites that contain information on people or organizations. 
Further, Internet also hosts compilations like biographical 
sources and directories containing information on people and 
organizations respectively. 


Employment / Career Sources : The Internet is a good 
source of information both for employers and those who are 
seeking employment. Important employment and career 
sources on Internet are: 


— EmploymentSpot.com 
(http:/www.employmentspot. conv, 


— Jobs.com 
(http://www. jobs.com/, 


— JobStar: California Job Search Guide 
(Attp.obstar. org/, 


— Academic Employment Network 
(Attp:/www.academploy.com/) 


— Employment Service 
(www.employmentservice.gov.Uuk/) 


— Employment.com.au 
(Attp:/www.employment.com.au) 
— Ajob4scientists.com 
(http:/www.ajob4scientists.com/) 
— IT Careers Web for Indian Prof. 


(http://www. winjobs.com/) 


—  Scijobs.org 
(http://www. sciiobs. org/, 


Funding / Grants Sources : Information on funding and 
grant-giving agencies can be easily sourced through the 
Internet. Most grant - giving agencies have their websites on 


Documents and Resources in Digital Libraries 341 


the Internet. Moreover there are web sites that provide 
information on various grant-giving agencies. Some of the 
important Internet resources are as follows: 


— The Regional Alliance: Resources 
(Attp.//ra.terc. EAU/TESOUICeS/, 


— SRA International Grants Web 


(http.//www. srainternational.org/newweb/grantsweb/ 
index.cfm) 


— Funding opportunities for training in biological and 
medical sciences 
(Attp./www. grantsnet. org/; 


— 100 Top College, University and Scholarship Pages 
(Attp:/www.coHege-scholarships.com/100college. htm) 


Libraries / Information Centres : Having recognized 
the importance of Internet in providing better services to users, 
the libraries have made their presence on the web through 
the Library Home Pages which serves as an integrated 
interface to various network-based library services it offers. 
Information sources on Internet are becoming an essential 
ingredient in the collection development of the library. A large 
number of libraries are making their appearance on the web 
not only in the developed countries but also increasingly in 
the developing world. The LibDex (Attp:/www.libdex.com/) 
which maintains a worldwide searchable directory of library 
websites list more than 17,000 libraries. Each record in the 
index provides links to web-based OPACs (Online Public 
Access Catalogues). Further, Libweb, the Digital Library 
SunSITE project (Attp.//sunsite. berkeley. edu/Libweb/) 
maintained by the University of California at Berkeley, lists 
more than 6,100 libraries with web sites from over 100 
countries organized by type of library for United States listings, 
. by continent and alphabetically for others. Some of the 
important libraries, library catalogues, union catalogues, 


342 Manual of Digital Libraries 


sources of information for libraries and information centres 
are as follows: 


— _Libweb - Library WWW Servers 
(http://sunsite. berkeley.edu/Libweb/, 


— The LibDex 
(http./www.libdex.com/; 


— The British Library 
(http://www. bl. UK/, 


— Library of Congress 
(http:/cweb.loc.gov) 


— Library of Congress WWW/Z39.50 Gateway 
(Attp-Mcweb.loc.gow/z3950/, 


— Library of Congress Catalogue 
(http:/catalog.loc.gov/; 


— Supersearch 
(Attp:/www.nbcels. org/, 


—  Melvyl Homepage 
(http://www. melvyl. ucop.edu/, 


— WebCATS 
(http://www. lights.com/webcats/) 


Organizations / Research Institutes / Companies / 
Societies : The Internet is an excellent source of information 
for organizations, business houses, research institutions, 
companies, societies and associations. Since most of these 
bodies have their presence on the Internet through their 
website or through other websites that lists them, they can 
be accessed through any of the web search engines. 
Moreover, Internet also hosts compilations and directories 
containing information on institutions and organizations. Some 
of the important sources on organizations / research institutes/ 
companies / societies on the Internet are as follows: 


Documents and Resources in Digital Libraries 343 


Associations on the Net 
(http://www. ipl. org/rev/AON/) 


The Nation Directory 
(Attp://www.thenation.com/directory/, 


International Organizations and NGO Websites 
(Attp./www.ula.org/website.htm) 


GuideStar: the National Database of Nonprofit 
Organizations 
(http://www. guidestar. ora/, 


Helping.org 
(http://www. helping. ord/, 


Researching Companies Online 
(Attp-/home.sprintmail.com/“debflanagar/index. html, 


Hoover’s Online - the Ultimate Source for Company 
Information 
(http://www. hoovers.com/, 


Thomas Register Online 
(http://www. thomasregister. com/, 


iCollege 
(http://www. icollege. com/, 


Associations Online Search Directory 
(htto://info.asaenet.org/gatewav/OnlineAssocSiist.html, 


People / Experts / Scientist Directories : The Internet 


hosts a plethora of information about people, experts and 
scientists through the web sites that these people host either 
on their institute’s site or on personal website or through 
various websites that contain information on people, experts 
and scientists. Further, the Internet also hosts compilations 
like biographical sources, telephone directories, regional 
directories, etc. There are several sites on “Ask-an-Expert” 
or “Ask-a-Scientist”. They can be used to obtain profiles of 


344 


Manual of Digital Libraries 


leading personalities / subject experts in specific fields. Details 
regarding their areas of expertise, affiliation, contact 
information, their research interests, etc. can also be obtained. 
Some of the important sites are as follow:. 


AgNIC : Agricultural Network Information Center 
(http://www. apnic.org/; 


Profiles in Science-Biomedical stars 
(http://www. profiles. nim.nih.gov) 


Bin Laden, Osama 


(http://www. pbs. org/wgbh/pages/trontline/shows/ 
binladen/) 


Caesar, Julius 
(Attp://www. virgil.org/caesai/, 


Clarke, Arthur C. 
(http: /www./si.usp.br/“rbianchi/clarke/, 


Cleopatra, Queen of Egypt 
(Attp:/www.fmnh.org/cleopatra/) 

Einstein, Albert 

(http://www. albert-einstein.org/) 

Gates, Bill 

(http. /www.microsoft.com/billgates/) 
Women in Biology 
(http:/pingu.salk.edu/~forsburg/bio. html) 
Ask-A-Scientist 

(http:/olbers. kent.edu/alcomed/Ask/ask. shtml) 
Scientific American: Ask-the-Experts 


(http /www.sciam.com/askexpert/index. html) 


Lycos’ Whowhere 
(Attp:/www.whowhere. lycos.com/, 


Documents and Resources in Digital Libraries 345 
4.2.6. Meta Resources 


Meta resources, variably called subject gateways, 
subject-based information gateways (SBIGs), subject-based 
gateways, subject index gateways, virtual libraries, clearing 
houses, subject trees, pathfinders, guide to Internet 
information resources, and a few more variations thereof, are 
facilities that allow easier access to network-based resources 
in a defined subject area. 


4.3. ADVANTAGES OF E-RESOURCES 
There are number of advantages of electronic 
resources, such as: 
— They allow remote access. 
— They can be used by many users simultaneously. 


— They are interactive and allow interaction between 
author/ publisher and users. 


— They provide timely access to documents. 
— They support searching capabilities. 


— They accommodate unique features such as links to 
related items. 


— They eliminate printing And postage cost. 

— They do not require physical processing. 

— They can easily merge with altering service. 

— They provide improved access through full text 
searching. 

— They can solve the problems of missing issue of 
journals. 


4.4. DISADVANTAGES OF E-RESOURCES 


The followings are some of the disadvantages of e- 
resources. 


346 Manual of Digital Libraries 
— Initial high infrastructure and installation cost. 
— Need special equipments to access. 
— Lack of compatibility among different publishers. 


— Hardware and software compatibility issues between 
publishers and users. 


— Excessive printing of documents. 


— Difficulty inherent in relating to a large amount of data 
on a screen. 


— Causes more concern about copyright. 


4.5. E-RESOURCES: ISSUES 


Several issues are to be taken into consideration for e- 
resource building as discussed below. 


Organizational issues : E-resource building is more 
oriented towards users point of view and to a great extents 
depends on organizational culture, objectives, effective 
strategic planning. 


Procurement issues : Procurement and installation of 
hardware, software, communication and networking etc. 
involve and require guidance of experts in the concerned 
areas. Internet connectivity, Internet based library 
applications, networking etc. requires technical skills of IT 
experts. 


Financial Issues: Initial expense to develop necessary 
infrastructures for e-resource building is quite high. Further it 
requires recurring expenditure for maintenance and 
continuation of such related services. 


Formats : It is quite difficult task to decide on the 
available file formats from a numbers of available format such 
as PDF, ASCII, HTML, GIF, TIF, etc. 


Documents and Resources in Digital Libraries 347 


Access to e-resources : Issues regarding accessing 
of information via Internet or through the corporate Internet 
besides hardware support, system support, standardization, 
ease of use and up gradation etc., are to be considered 
properly. 

Security : It is associated with the issues pertaining to 
security of passwords and information from misuse, hacking 
etc. 


Retention : It means to look at the issues of retaining 
the articles, copying, downloading under IPR rules and 
regulations. 


Licensing : The e-resources are acquired via licenses 
for accessioning the electronic copy for a specific period of 
time and usage as per the terms and conditions negotiated 
in the license. 


4.6. ARCHIVING OF E-RESOURCES 


Archiving is not a new concept to librarians since they 
have been doing this job since the libraries started acquiring 
printed materials. But due to technological developments and 
generation of documents in electronic form, they have to deal 
with e-resources. Archiving of e-resources assumes greater 
importance due to production of large number of e-resources 
both online and offline. Most of the libraries and publishers 
are concerned with archiving of e-resources so that the 
information can be made available for long term access to 
the scholarly community. 


The process of archiving requires preserving, storing, 
organizing, and providing effective research facilities to the 
archived data. Therefore librarians and library professionals 
should posses the knowledge of creating and maintaining 
the data bases along with the knowledge of computer, 
computer languages and different features of the software 


348 Manual of Digital Libraries 


used for archiving purpose when purchased from a vendor. 
Besides this other factors such as cost, time, technology 
should also be considered while planning for archiving. 


The content of any e-resources is the property of the 
publishers. They supply it directly to the end users or through 
a third party i.e., aggregator or vendor. Thus the initial 
responsibility for archiving e-resources rest primarily on 
published then on aggregator/vendor/copy right holders and 
lastly on end users i.e., libraries. 


4.7. E-RESOURCE ARCHIVING : SOME ISSUES 


E-documents are particularly vulnerable, since the very 
development of technology continuously makes the hardware 
and software that contains them outmoded. A web archive 
should solve the technical problems facing all e-documents 
as well as its unique problems. The information is generated 
continuously, it is not discrete rather it is linked, as a result 
the boundaries of the object to be preserved are ambiguous. 


The electronic media, software applications, computer 
hardware all continue to change at a rapid rate. Therefore 
policies must be developed to address that reality and the 
archive must change. The very notion of a permanent or fixed 
archive must have to give away to an ecological preservation 
system that is in a state of constant change. 


Users of as e-archive must secure that the content they 

find in archive is the content of the author created, and the 

publisher published as in case of print collection. But the e- 

archive has the responsibility to migrate content through 

numerous generation of hardare and software, according to 

the rapid changes in technologies of all sorts. Content should 
include data and discourses about the data that author submit 
to the publisher, and revisions should be prompted by peer 
review or copy editing or editorial contents. Further the archive 
will not complete with the publisher’s presentation rather there 


Documents and Resources in Digital Libraries 349 


should be a value adding activities that do not have a major 
impact on the readers’ ability to read content but may help 
the reader to locate, interact and understand the content. 


As print collections dwindle under the financial 
pressures exerted upon libraries to go electronic, we must 
face this question— What guarantees that these digital 
documents will become a permanent part of the human 
record? Currently there is no guarantee, and that is troubling 
indeed. It does not take long for a “current” technology to 
become obsolete. The Library already houses extensive 
material in a variety of formats, such as wax cylinders and 
eight-track tapes, whose playback mechanisms are outmoded 
or in disrepair. When this happens, the original content is, in 
effect, lost. So it will be with e-resources on the web if the 
stakeholders in scholarly communication do not act soon. How 
long before the web evolves to a point where it will become 
obsolete, superseded by some newer transmission 
mechanism? When that happens, what will happen to web- 
based PDF journals? We have no ready answer to date. 


Here, e-resource archiving involves some levels of 
organization and preservation to enable potential use. The 
archived e-collection in libraries has to be developed into 
systematic methodologies to sustain the content, 
infrastructure and physical aspects of material. Therefore 
standards have to be developed to scale the process of 
archiving. The role of stakeholders and new standards need 
to be established for archiving e-resources which requires 
communication and collaboration between traditionally distinct 
groups including publishers/vendors/aggregators/authors/ 
researchers and libraries. 


4.8. FACTORS OF ACQUIRING ELECTRONIC 
RESOURCES 


There are associated various factors with acquiring of 


350 Manual of Digital Libraries 


electronic resources in a library, like selection, acquisition, 
long-term access etc. Each is briefly discussed below : 


Selection: Research is being carried out by many 
archives and libraries into the best methods to give access to 
electronic materials in the very long term. Because of the 
sheer quantity of material being produced, particularly for 
access via the World Wide Web, selection is essential. Many 
archives and libraries use the existing selection criteria 
relevant to printed materials for electronic materials as well. 
They consider that the contents of the document are the 
relevant factors for selection and not the medium. This means 
that the physical carrier, the hardware and the software used 
are not relevant for the selection process. Local policy defines 
the criteria for selection, e.g., in Germany audiovisual material 
is included in the national bibliography, in some other 
countries it is not. 


Acquisition and Registration: Off-line publications can 
often come to the library as printed publications. Obviously, 
when the library starts collecting off-line publications, the 
publishers have to be notified. In the Netherlands, where 
deposit is done on voluntary basis, it is important that the 
publishers are kept informed about the new selection criteria. 
In France, the law defines what publications are to be 
submitted. On-line publications require a new form of co- 
operation. The publication has to be transmitted from the host 
system to the library via the network. 


Selected documents are either ordered, transferred 
automatically by the publisher or harvested by the library with 
a harvester application. For on-line documents, acquisition 
means the physical migration of the document from the host- 
system to the depository system. The publisher/producer or 
the administrator for archives, needs to be involved in this 
process. It is necessary to register documents when they are 

received by the library. This requires the exchange of 


Documents and Resources in Digital Libraries 351 


bibliographic information between the depository library and 
publisher preferably before acquisition. The registration of 
incoming documents should be activated on arrival. 


Installation: It is necessary to install the electronic 
publication so it can be viewed and described by the librarian. 
For on-line documents, a connection to the host system is 
required and off-line documents have to be physically installed 
on a workstation. 


Description of the Document: Cataloguing system for 
electronic documents are still the subject of much debate. 
Various groups are discussing how to describe an electronic 
document. The existing book-based systems such as MARC 
and its variants do not fully describe these new formats. 


Metadata: Electronic publications offer an opportunity 
to automate part of the production of a catalogue. 
Bibliographic data can be retrieved from the electronic 
publication itself, e.g. from the table of contents (TOC). A 
research project of the European Commission, BIBLINK, is 
studying how data can be exchanged between publisher and 
library in an automated way. The Dublin Core defines the 
fields that are necessary to support adequate bibliographic 
description of a Web page. 


Dublin Core has received significant support, 
particularly from North America and including some 
publishers. A threat that may ultimately make it unacceptable, 
is that the Dublin Core contains too many features requiring 
definition at the national level or that require a large 
maintenance overhead. 


Unique Identification: In the international book trade, 
the unique identification numbers ISBN (International 
Standard Book Number) and ISSN (International Standard 
Serial Number) are widely used to uniquely identify a certain 
version of a monograph or serial publication. ISBN and ISSN 


352 Manual of Digital Libraries 


are also used for CD-ROMs and on-line publications like 
electronic journals. However, these numbers are not designed 
for electronic publications and a proposal was, therefore, 
made for a Digital Object Identifier (DOI). The DOI is designed 
by Association of American Publishers and the Corporation 
for National ‘Research Initiatives, which is taking measures 
in this direction. 


Authenticity and Integrity: Some electronic publications 
can easily be changed. What guarantee is there that the 
bibliographic description defines exactly the version which is 
stored? And will it still do so after the lapse of several years 
and the migration to other carriers and formats. This is still a 
very tricky area. Several methods are being considered, e.g. 
time stamps, encryptions and watermarks. But the final 
solution for this issue has yet to be resolved. 


De-installation: After the bibliographical and technical 
description the electronic publication must be removed from 
the hard disk on the computer and an on-line session must 
be closed. This activity has generated new information which 
should be included to the descriptive record. 


Migration and Storage : Other factors that have to be 
considered when collecting electronic documents include the 
following: 


— Migration of the electronic content from the original 
carrier to the physical storage of the depository system, 
including migration quality control and duplication for 
backup. 


— The physical storage system will probably use different 
types of media with different access speeds, e.g. hard 
_ disc (very fast), magneto-optical (fast), tape (slow). This 
requires sophisticated software to monitor the use of 
documents and to shift documents from tape to discs 

and vice versa. 


Documents and Resources in Digital Libraries 353 


— Pathfinder is a storage records the physical locations 
of all the files in a document and makes the file map 
available to the search engine. 


Conversion and Emulation: Do you have to convert 
the format of the document to a new format, or do you have 
to design a system in which the document is stored in the 
original format? Emulation software enables the document 
stored in the original format to be viewed using the new 
hardware and software. 


These techniques are concerned with preservation and 
final solutions have not yet been found. Increasing speed of 
technological innovation, new publishing techniques, InterNet 
and the present lack of standards are a few examples of the 
uncertainties in which the librarian or the manager of a 
depository system must work. There is no proven solution 
for these systems, large vendors have build systems for data- 
warehousing and data-mining, although they still lack 
structured indexing and large scale preservation solutions 
needed by libraries and archives. 


Besides, there are some issues related to long term 
availability and access for end users. These include : 


Indexing : Descriptive information is indexed for use 
within the search engine of the depository system. This engine 
can be part of the pathfinder software or can be a separate 
existing library system’s OPAC module, to be defined locally. 
To find the right compromise between indexing requirements 
and the technical possibilities is very complicated. 


Access : Access to electronic publications by end users 
must be clearly defined. At present, most access is “on-site” 
but, when agreements are made with the owners of the 
information, remote access may be possible. As with the 
deposit for printed publications, electronic deposit collections 
should be used as “collections of last resort”. Libraries can, 


354 Manual of Digital Libraries 


however, give access when agreements are reached with 
publishers and authors. 


Copyright Issues, Authors and Publishers : It is obvious 
that it is very important that the digital archives and libraries 
discuss restrictions on access and availability with publishers 
and authors when this is appropriate. 


Usage of Standards: There are many relevant 
standards for electronic publications. The European 
Commission has launched an initiative, Oll (Open Information 
Interchange), as part of the IMPACT2 programme. The aim 
of the Oll initiative is to promote the awareness and use of 
standards for the exchange of information in electronic form. 
The target audience are developers and providers of 
information products and services, as well as the end-users. 


Standards can be purchased from international 
standard offices and many countries have an organization 
which translates and distributes the standards. For more 
information visit the Commission’s Web site where copies of 
publications on standards can be found. For the preservation 
of electronic publications a variety of standards are relevant. 
These include standards on hardware, operating systems 
(Windows, MS-DOS, UNIX), physical carriers (CD-ROM, 
WORM, DAT, diskettes, magnetic tapes), application 
programmes like wordprocessors, databases, spread sheets 
and formats like MARC, SGML, HTML etc. 


Printed publications like monographs and serials are 
no longer available on the market permanently. After a 
relatively short time, a specific edition of a monograph can 
be difficult to find in a book shop. It may be possible to order 
from a large distributor or even the publisher. With off-line 
electronic publications it is exactly the same. The publishers 
are no longer interested in keeping publications available 
when there is no commercial interest in the products. This 
may be understandable from the market point of view, but is 


Documents and Resources in Digital Libraries 355 
still unfortunate. 


In addition, publishers often do not have a full archive 
of their own publications. It is very important, therefore, that 
as soon as possible after the publication date a document 
should be selected, described and made available by a public 
body like a national archive or a national library. 


4.9. EVALUATION OF ELECTRONIC RESOURCES 


Acquiring of any resources needs evaluation for 
selecting a good resource. Electronic resources evaluation 
often takes place at the micro-level — evaluation of a specific 
resource, rather than the macro - level — evaluation of an 
entire collection of resources. Thus, the traditional collection 
development distinction between selection and evaluation 
sometimes blurs in regard to electronic resources, as they 
are frequently evaluated for selection purpose. 


Electronic resources are relatively new on the library 
scene. The identification of the criteria to be used is the logical 
initial step in the evaluation of these resources. A significant 
portion of the criteria relate to content issues. Separate sets 
of issues pertinent to collection development / management 
of collection evaluation are outlined below for each section. 


Web Sites 
— Identification of web site evaluation criteria. 
— Comparison of different sets of criteria. 


— Addressing the applicability of traditional print evaluation 
criteria to the web. 


— Identification of core web sites. 


Content and Coverage of Full-Text Database 
— The extent or completeness of full text coverage. 


356 


Manual of Digital Libraries 
Overlap in coverage among different database. 
The quality of the journals contained in the database. 


Currency of database coverage compared to the print 
versions of the journals. 


Suitability of a specific database for a particular library. 


Evaluation of Electronic Journals and Other Electronic 
Resources 


Methods for counting electronic journals holdings. 


Criteria for choice between the electronic and print 
formats. 


Criteria, metrics, or performance measures for 
electronic journal evaluation and selection. 


The applicability of traditional evaluation criteria to 
electronic resources. 


The development of new metrics or measures for 
electronic resources. 


Application of Citation Analysis to Electronic Resources 


The extent to which print sources cite electronic 
resources. 


The extent to which electronic resources contribute to 
scholarly communication. 


The comparative citation rates of print and electronic 
journals. 


The continuing accessibility of cited web resources. 


Use of citation data for evaluation of electronic 
resources. 


Documents and Resources in Digital Libraries 


Use of Electronic Resources a 


Identification of methods lending themselves to valid 
cross-library comparisons of use. 


The comparative use of print and electronic resources. 


The impact of electronic resources on the use of print 
materials. 


But the criteria for evaluation of printed documents can 
be used for their electronic offline surrgates. However, 
separate guidlines for evaluating online or web resources 
need to be discussed separated. These are discussed below: 


4.10. WEB SITE/INTERNET RESOURCE EVALUATION 


The first web pages emerged is that far-away era of 
the early 1990s. Email and the Internet were already 
becoming well known but the web, which like e-mail uses the 
Internet’s global computer network to share information in 
commonly agreed-upon ways, had its start among physicists 
only in 1991. It moved into the mainstream in 1993 when the 
National Center for Supercomputing Applications (NCSA) at 
the University of Illinois released Mosaic, an easy-to-use 
graphical web browser that ran on most standard computers. 
Between mid-1993 and mid-1995 the number of servers-the 
computers that house websites-jumped from 130 to 22,000. 
Even with the user-friendly Mosaic encouraging a major 
expansion of this new medium, only a few historians ventured 
out on the web frontier. Many of the pioneers already had 
some technical interests Or background. In November 1 994 
Morris Pierce, an engineer who had recently earned a history 


358 Manual of Digital Libraries 


Ph.D., created one of the first departmental websites for the 
University of Rochester. It “seemed like a natural thing to do,” 
he recalls. 


George Welling who had already worked in Department 
of Humanities Computing, in the University of Groningen 
(Netherlands) in the fall of 1994, developed a course in 
computer skills for American history students and asked them 
to construct an American Revolution website. Other History 
Web pioneers came to the medium out of experience with 
earlier Internet applications, particularly e-mail. Joni Makivirta, 
a student at the University of Jyvaskyla, Finland started an 
online history discussion list because he noticed lists on other 
topics and thought a history list would allow him “to get ideas 
from professional historians around the world” for his master 
thesis. 


The participants included George Welling, Thomas 
Zielke, who later took over the list, Richard Jensen, who went 
on to found H-Net in 1993, Don Mabry, a Latin American 
historian at Mississippi State University, and Lynn Nelson, a 
medievalist at the University of Kansas. In 1991, Mabry- 
responding to the difficulty of circulating large documents via 
email-began to make available primary sources and other 
materials of interest to historians via “anonymous FTP”-a “file 
transfer protocol” that allows anyone with an Internet 
connection to download the files to their own computers. 


Nelson created his own site and then had the idea of 
linking together the emerging set of history FTP sites into 
HNSource using Gopher, a hierarchical, menu-driven system 
for navigating the Internet that was much more popular than 
the web in the early 1990s. In September 1993, just after 
Mosaic was released, Nelson made HNSource available: 
through the newweb protocols, and it became one of the first 
if not the very first historical site on the web. 


Documents and Resources in Digital Libraries 359 


In the 1980s and early 1990s, the most intense energy 
in digital history centered not on the possibilities of online 
networks but rather on fixed-media products like laser disks 
and CD-ROM. In 1982, the Library of Congress began its 
Optical Disk Pilot Project, which placed text and images from 
its massive collections on laser disks and later CD-ROM. With 
a large amount of material already in digital form, the library 
could quickly take advantage of the newly emerging web. In 
1992, it started to offer its exhibits through FTP sites. Two 
years later, the library posted its first web-based collection, 
Selected Civil War Photographs. 


Around the time that these early settlers carved out 
primitive digital history homesteads, the first signs emerged 
that this new frontier might feature more than noncommercial 
exchange. In October 1994 Marc Andreessen and some of 
his colleagues who had developed Mosaic at the government- 
funded NCSA released the first version of a commercially 
funded browser they called Netscape. Within months, Mosaic 
was, as they say, history, and Netscape was king of the World 
Wide Web. The Netscape era saw the History Web come 
into its own. 


In mid-1995, “the explosion in Web sites has brought 
with it an explosion in materials relevant to historians.” Earlier 
that year, the Center for History and New Media (CHNM) had 
helped the venerable AHA (American Historical Association) 
to launch its website; by that summer forty-five history 
departments had posted home pages. The online presence 
of the AHA and the Library of Congress provided an official 
imprimatur to the History Web. But in those early years, 
amateurs, not professional historical organizations, provided 
the crucial energy for much of its growth. Starting in 1995, for 
example, Larry Stevens, a telephone company worker from 
Newark, Ohio, established a series of websites on Ohio in 
the Civil War. The sites combined his two hobbies of history 


360 Manual of Digital Libraries 


and computers, and, he explained, he “decided to carve a 
niche into the net before the big boys, and Ohio Historical 
Society, Ohio State University, etc., entered the field.” 


Since the mid-1990s, the History Web has spun its 
threads with astonishing speed. Even by 1996 the “walking 
city” that was the History Web a year earlier had become a 
sprawling megalopolis that no one person could fully explore. 
Yahoo counted 873 U.S. history websites in an incomplete 
census that fall. But seven years later, an even less complete 
tally returned almost ten times as many American history 
websites. These results reveal a deep and wide fascination 
with history among the web-browsing public. 


Since the starting of 21st century, web-pages and home 
page creation along with internet resources is doubling almost 
every year, so it becomes necessary to develop some tools 
to evaluate them. The founder of the Info filter Project argues 
that librarians should capitalize on the patron trust they have 
earned from their book recommendations and apply “well- 
developed and tested principles of reference reviewing” to 
web resources. Collins criticizes web-based reviewing tools, 
such as Magellan, as “mostly ingenious variations on the 
concept of ‘cool.” Six Web site evaluation standards are 
proposed. These are : 


e Content - Uniqueness, usefulness, and accuracy are 
listed. 


e Authority - The credibility of those responsible for the 
site. 


e Currency - How often is it updated? 
e Organization- To which it belongs. 


e Search Engine - Does it include Boolean and keyword 
searching plus relevance ranking. 


e Accessibility - Is it consistently available? 


Documents and Resources in Digital Libraries 361 


This entry's criteria result from an ongoing University 
of Georgia project to establish criteria for evaluation of Internet 
resources. An initial set of 509 possible quality indicators were 
identified through an e-mail survey of Internet resource guide 
compilers, as well as through reviewing Internet rating 
systems, the library science literature on reference and other 
topics, and the web design literature. After editing, the 
indicators were reduced to 125 criteria and organized under 
eleven categories: 


° Site access and usability. 

° Documentation and resource identification. 
e Author identification. 

e  Author’s authority. 

e Information structure and design. 
e Content relevance and scope. 

e Validity of content. 

e Content balance and accuracy. 

e Navigation. 

° Link quality. 

e Aesthetic aspects. 


Five to eighteen indicators are listed under each 
heading. For example, the six points under relevance and 
scope consider user need, provision of new information, and 
obvious omissions, while the nine criteria under content 
validity include peer review, bibliographies and footnotes, and 
statistics to support conclusions. 


But in practice, the criteria employed for evaluating 
Internet information resources can broadly be divided in to 
the following four categories: 


362 Manual of Digital Libraries 
e Initial appraisal 
e Suitability of resource 
e Content analysis 


e Structure and presentation 


4.10.1. Initial Appraisal 


The initial appraisal of an Internet information resource 
may be made based on the following criteria: 


Author : Author's credentials, i.e. institutional affiliation, 
educational background, other scholarly works, experience, 
etc. Secondary services and online databases may be used 
to determine how prolific an author is. Biographical sources 
may be used to determine the author's credentials. Citation 
index can be used to find how frequently an author is cited. 


Rate of Publication : Date of publication or date of last 
revision is an indication of currency of information. The date 
of last revision is generally given on the home page of a site. 


Edition or Revision : Revision / updation reflect changes 
in the subject contents. 


Publisher : Publisher does not necessarily guarantee 
quality. However, publications from a university press or 
scholarly society are likely to be treatises of high scholarly 
value. 


Title of Journal : Is the journal popular or scholarly? 
The two flavours have a different target audience that presents 
different levels of complexity in presentation of items. 


4.10.2. Suitability of Resource 


After initial appraisal of a resource, determine the 
author’s intentions for publishing Internet information 
resource. Scan its contents and indices to determine the 


Documents and Resources in Digital Libraries 363 


Suitability of the resource for the meta resource on the 
following criteria: 


Scope and Coverage : Scope and coverage of an 
information resource is an important consideration in its 
evaluation. Since most Internet-based information resources 
do not have a formal introduction or preface, determining the 
scope and intended audience can be a daunting task. Breadth 
and depth on an Internet information resource would 
determine suitability of a resource for a meta resource. The 
time period covered in an information resource is also an 
indication of its coverage. A resource may be an overview of 
a topic or it may be specifically focused on only one aspect of 
the topic. 


Factual versus Opinion : The information contents 
should be factual, it should not be propaganda, advertisement 
or opinion. It is not always easy to separate facts from 
Opinions. Facts can usually be verified, opinions, though they 
may be based on factual information, evolve from the 
interpretation of facts. Skilled writers may present their 
interpretation of facts as facts. 


Primary versus Secondary : Assess whether the 
information is primary or secondary in nature. Primary sources 
are the results of original research, while secondary sources 
are derivatives of the primary sources. Scholars use primary 
resources to further their research work as well as for writing 
secondary works like textbooks, articles for encyclopaedia, 
etc. Books, articles in encyclopaedia, etc. are secondary 
sources of information while research articles in journals and 
conference proceedings are primary sources of information. 


Scholarly versus Popular : A scholarly journal is 
generally one that is published by and for experts. The articles 
in a scholarly journal go through a process of peer review in 
which a group of widely acknowledged experts in a field review 
the article for its contents, scholarly soundness and academic 


364 Manual of Digital Libraries 


value before it is accepted for publication. The scholarly 
journals publish new, previously unpublished research. 


Popular magazines range from highly respected 
publications such as Scientific American to general-interest 
news magazines like Newsweekand Time. Articles in popular 
magazines are generally written by the staff writers and 
freelance journalists. The articles in popular magazines do 
not go through the process of peer review and rarely contain 
bibliographic references. 


Audience : The information contained in an information 
resource should be relevant to the person using it in terms 
for whom the information is aimed at. An information resource 
on Internet should clearly define its potential audience. In 
Ranganathan’s parlance of Every book its reader and every 
reader his/her book may be used like “Every Website its Surfer 
or “Every Surfer his or her Website”. Hence, website should 
define its purpose and targeted audience clearly so as to find 
its user. Lack of focused audience might prevent a strong 
connection between sites and their users, and will ultimately 
render the site under-used, unused or unusable. The site 
should clearly answer (i) the typed audience it is targeting; 
(ii) whether the information is targeted for specialized or 
general audience; and (iii) whether the information contents 
of site is elementary, technical or advanced. Besides, the 
information that is required to be obtained to judge the 
Suitability of aninformation source include — the intended 
coverage, and the intended audience 


4.10.3. Content Analysis 


Content analysis include following aspects. 


Accuracy : Information contents of a resource should 
be accurate. The contents of a resource should be reliable 
and error-free. References to published information indicate 
that information has a research basis. Accuracy is also 


Documents and Resources in Digital Libraries 365 


assured if the information contents of a resource have 
undergone the process of referring or editorial control. 


Authority and Reputation : The reputation of an author 
as an accomplished authority in his field of study is an 
important criterion of evaluating traditional as well as Internet 
resources. It is specially applicable to Internet resources given 
the fact that any one having access to a website can publish 
any information on the Internet without going through the 
process of reviewing, referring and editing. An author’s 
affiliation to an organization of repute is also an indication to 
his/her authority. References, bibliography and/or footnotes 
indicate that the author has consulted other sources and 
services to authenticate the information that he or she is 
presenting. 


Objectivity : Information contents of a resource should 
be factual, unbiased and written most objectively. The 
information contents should not be propaganda, 
advertisement or opinion. It is not always easy to separate 
facts from opinions. Facts can usually be verified, opinions, 
though they may be based on factual information, evolve from 
the interpretation of facts. Well-researched information should 
be supplemented with evidences, references to the past work 
and footnotes. The ideas and arguments advanced in the 
information resource should be with the other works on the 
same topic. The more radically an author departs from the 
views of others in the same field, the more careful and critical 
an evaluator should be to scrutinize his or her view. 


The potential for bias introduced by an individual or 
organization involved in the production or dissemination of 
information, such as host of a web site, a publisher or a 
sponsor, can also impact upon the potential accuracy of a 
resource. Information resources with bias can either be 
excluded from a meta resource or it may be included with a 
note in the resource description highlighting the source of 


366 Manual of Digital Libraries 
bias of any kind. 


Currency of Information : The date of the last update 
given on the site indicates currency of a resource. For 
individual documents, the date indicates when the document 
was written or last updated. For resources where there is a 
regular change to the contents, such as journals, databases 
or news information, frequency or regularity of updating is 
the indication. 


Completeness : The information contents of a website 
should be complete and comprehensive. The information 
contents of a resource should not have noticeable omissions. 


The information that is required to be obtained in the 
process of content analysis include — details of any 
organizations and/or individuals involved in the production 
and dissemination of the information, including the author, 
webmaster or equivalent, copyright owner, publisher, 
sponsor, etc.; contact details; copyright statement; subjects 
and types of materials covered; comprehensiveness of 
coverage; notable omissions; notable indicators of accuracy, 
e.g. potential for bias, ability to e-mail corrections; audience 
and level of detail if explicitly stated; the provenance of the 
source; editorial or refereeing procedures; research basis to 
the information; and frequency and / or regularity of updating. 


4.10.4. Structure and Presentation 
This point com further be elaborated as under : 


Writing Style : The writing style followed for an 
information resource would largely depend on the targeted 
audience. In general, the text should be easy to read. it should 
follow the basic rules of grammar, spelling and literacy 
compositions. The arguments put forth by the author should 
not be repetitive. 


Structure : The information resource should be 


Documents and Resources in Digital Libraries 367 


organized logically with major points or headings clearly 
presented. The resource should follow the basic principles of 
graphic design, wherever applicable. The graphics used in 
the information resource should add to the information 
contents and not distract the users from it. Features such as 
site map, index, and menu system or search facility enhance 
the usefulness of an information resource. A well-structured 
site should lead a user to the information he or she needs 
within a reasonable number of links, preferably 3 or 4. 


Design and Layout : Layout and design of a website 
should communicate a sense of location to the user, based 
on apparent patterns and consistent use of visual elements 
such as headings. Patterns in the background should help 
identify the page or orient the user to location within a complex 
site. The images should act as visual clues to orient the user 
within a site that is large or subtly organized. The element of 
consistency should be maintained between different parts of 
the same resource. It is important that the information 
resource has adequate navigation aids to facilitate easy 
movement of user within and outside the site. Each screen 
should offer a direct route back to higher levels or even direct 
connection to major parallel areas within the site. Use of 
frames can provide immediate access to be main constituent 
parts of the site while providing the over all structure of the 
site. 


Further, every element of a website should justify itself 
because unnecessary HTML affects loading time and can 
slow down the site. HTML showmanship is disservice to users 
and devalues a site. To paraphrase Ranganathan is ideas, 
for the Internet environment “Save the time of the Web surfer”. 
The web may be free but the user's time is not. 


Easy of Use : The information source should be easy- 
to-use even for a novice user. Most sites provide contact 
information and user support services or links to training 


368 Manual of Digital Libraries 


courses, user discussion lists or user Support groups as 
additional information. 


Accessibility and Reliability : The information resource 
should be easily accessible and quick to load. It should be 
compatible with the most popular browsers like Internet 
Explorer and Netscape Navigator. If the information resource 
requires another piece of software or plug-in to display the 
information contents, mention should be made of that. For 
example, most e-journals are available in PDF format and it 
requires an Acrobat Reader to display or use PDF files. 
Several sites have Flash Animation that require the Flash 
Software to display it. 


Similarly, the site should be stable, i.e. the source 
should not change very frequently. In case of change in URL, 
the old URL may be used to provide a link to the new page. 
Besides, the information that is required to be obtained while 
assessing structure and presentation include: 


e  {faresource is frequently unavailable or noticeably slow 
to access; 


e Any access restrictions, e.g. by geographical region, 
hardware/software requirements; 


e If there is a registration procedure and whether this is 
- straightforward; 


e The copyright statement and any copyright restrictions; 


e Notable design features and facilities, whether 
particularly good or particularly bad; 


e Appropriate or inappropriate use of images and/or 
advertising; 


° If the site is particularly difficult or particularly easy to 
use; 


e Presence of absence of user support facilities and/or 


Documents and Resources in Digital Libraries 369 
help information; and 


e Particularly good or particularly bad help information 
or support services. 


4.11. PROCESS OF EVALUATION 


The process of evaluation of Internet information 
resources consists of the following steps: 


4.11.1. Identification of Links to Resources 


Identification of links to resources to be included on a 
meta resource is the first step in the process of evaluation. 
Identification of information resources may be done using 
mailing lists, distribution lists, other meta resources, Internet 
resources newsletters, Internet search engines, speciality 
search engines, directories of Internet resources, etc. 
Besides, Internet altering services like Gary Price’s New 
Resource Bulletin and Scout Report (Attp.// 
scout.cs.wisc.edu), or listservs and discussion groups can 
also be subscribed to. One can also obtain the links to 
resources through newspapers, magazines, e-mails, etc. 


4.11.2. Follow a Link to Find Out More about the Resources 


Following a link to the document where the link is 
originally referred can provide details about the intended 
scope and audience, and whether the information is likely to 
be updated and how often. The links to the present document 
can lead to the producer of information and its origination. 
The details like provenance of the source, individuals or 
groups responsible for information, details of their expertize, 
details of organizations involved in the production and 
dissemination of information, details of their reputation and 
expertize within the field, can also be obtained. Contact details 
and copyright information may also be collected which can 
be used for assessing the authority of the information. 


370 Manual of Digital Libraries 


Questions that an evaluator should be trying to answer in 
this step are: 


— What is the subject scope of the resource and is it 
relevant to the meta resource? 


— Whois the intended audience? 
— Whois responsible for the information resource? 


— isthe individual or group responsible for the information 
resource qualified to provide this information? Are 
contact details available? 


— ls the organization, such as publishers, sponsors or 
funding agencies, responsible for the information, 
reputable and recognized? 


— Is the resource well-known and / or heavily used? 


— Whatis the provenance of the resource? Does it have 
a print or electronic predecessor and how long has it 
been available? 


— Is the information likely to be kept up-to-date? 


—  Arethere any access restrictions? 


4.11.3. Analyse the URL 


The URL (Universal Resource Locator) provides useful 
information for evaluation of an Internet resource. Most often 
the URLs consist of meaningful words or phase(s) conveying 
the contents or purpose of a website. The URLs also provide 
indication about where the information comes from, who has 
produced it or why. Different countries and organizational 

domains are represented differently in URLs. Important 
prevalent domain names are: 


„ac .biz .cc .com .edu .gov .info .net .org .co .mil 


Countries except the US have an additional country 
code (e.g. “.in” for India), although many site do not use 


Documents and Resources in Digital Libraries 371 


country codes, “www.al/ldomains.com “provides a complete 
list of country codes along with sub-domains within that 
country. 


You can delete portions of tail-end of a URL to find out 
more about the resource. For example, in the URL Attp.7 
www. litd.ac.in/library/services/erl. html. 

The deletion of “services/erl.html” would lead you to 
the home page of the Central Library, IIT Delhi. Further, 
deletion of “library” would lead you to the home page of IIT 
Delhi. In other words, the tail-end of a URLS take you to sub- 
parts of websites while the main URLs represent the parent 
organization. 


Deleting parts of URL is a useful technique that can be 
used to assess the authority of a resource and those 
responsible for producing it. Questions that an evaluator 
should be trying to answer in this step are: 


— Where has the information come from? 


— Has an individual or group taken responsibility for the 
resource? Are they qualified to provide this information? 
Are contact details available? 


— Isan organization responsible for the information? Are 
any organizations associated with the resource, such 
as publishers, sponsors or funding agencies, reputable 
and recognized? 


4.11.4. Examine the Information 


Once the authority of a resource and its producer of 
information are established, you need to examine the 
information contained within it to assess its coverage, 
accuracy and its currency. 


Assess the Coverage of the Resource : The 
examination of index, contents pages and site may be made 


372 Manual of Digital Libraries 


to get an assessment of the range of subjects covered within 
a resource, whether a resource is comprehensive or if there 
are notable omissions. Evaluation of some of the resources 
may be a daunting task because of their comprehensive 
coverage. One could browse major headings to assess the 
types of materials that are covered and comprehensiveness 
of coverage within different areas. Search facility on site can 
also be used to find specific areas and to identify omissions. 
Although many sites provide their target audience, however, 
this needs to be confirmed by browsing the information and 
reading some of the text. Questions that an evaluator should 
be trying to answer in this step are: 


— Whatis the subject coverage of the resource? 
—  Isthe resource relevant to the meta resource? 
— Is the resource comprehensive within its given area? 


— Whatis the range of different subjects covered within 
the area? 


— What is the retrospective coverage of the source? 
— Does the resource cover the subject adequately? 


— Does the information provided have sufficient details 
for the target audience? 


— Are there any links to further information? 


— Does the link add value to the existing information or is 
it of value as an information source in its own right? 


Accuracy of Information : The accuracy of a resource 
could be assessed by searching sites for information known 
to you, alternatively an expert in the area may be consulted 
for this purpose. If none of two options are available, a range 
of other factors can be used as an indicator of accuracy. Some 
of them are as follows: 


References to published information are indications to 


. Documents and Resources in Digital Libraries 373 


the fact that the information has a research basis. The site 
might also indicate whether the process of referring or editorial 
control has been exercised before publishing the information 
on the site. The resource may put in place a mechanism for 
users to ascertain its accuracy and quality of information by 
providing links to other structural resources. The information 
content of a site may also be biased which, in turn, may affect 
the potential accuracy of a resource. Such material, if included 

on a meta resource, could be included with a note in the 

resource description highlighting the source of potential bias. 

Questions that an evaluator should be trying to answer in 

this step are: 


— Is the information accurate? 


— Has the information gone through a process of editing 
or refereeing? 


— Does the information have a research basis? 


— Is the information supported by published research 
findings? 

— Is there any evidence that the source may be biased 
by those involved in its production and / or 
dissemination? 


— __Is there a facility for sending corrections to inaccurate 
information? 


— Is the source professionally presented? Are there any 
typographical or grammatical errors? 


Currency of Information : The resource site generally 
provides information on date of production and updating of 
materials as well as details about the frequency and regularity 
of updating. Individual documents may indicate when they 
were written while resources such as journals, databases or 
new information are updated at a regular frequency. Resource 
description may include such details. 


374 Manual of Digital Libraries 


However, currency of information can also be verified 
by searching current facts and by browsing through hypertext 
links to assess whether they have been maintained. 
Questions that an evaluator should be trying to answer in 
this step are: 


— Is the information up-to-date? 
— Is the information likely to be kept up-to-date? 


— Where applicable, how frequently and /or regularly is 
the information updated? Is this appropriate to the type 
of information? 


4.11.5. Assess Accessibility 


Contrary to the popular belief, all information resources 
on the Internet are not free, several Internet resources have 
restrictions to their accessibility such as cost, access to 
geographical regions only, requirement for specific hardware 
or software tools and need to register. The level of complexity 
of the registration process may also be indicated. The 
resource descriptions may also include modes and level of 
charging. 


Accessibility may also be assessed in terms of time 
taken in accessing a resource. Some sites are particularly 
slow to access, sometimes because of inclusion of large 
graphics. Such limitations may be mentioned in the resource 
descriptions. The accessibility options may also include 
alternative URLs, mirror sites, if available, or sites in other 
languages. Mention may also be there about the provision of 
copyright especially if the resource is freely available for reuse. 
Questions that an evaluator should be trying to answer in 
this step are: 


— Is the resource frequently unavailable? 


— Do the graphics / pictures inhibit ease of access? 


Documents and Resources in Digital Libraries 375 
—  Isthere a mirror site? 
— Are there any geographical access restrictions? 


— Is special hardware or software required to access the 
resource? 


— Do users need to register to use the resource, and if 
so, is the registration a straightforward process? 


— Is there a charge to access the resource? 


— _Isthe resource written in English? Is a special character 
set required? 


— Is the information in the public domain or are there 
copyright restrictions? ¿ 


4.11.6. Consider the Design and Layout of the Material 


The design and layout of a site can enhance its usability, 
likewise, invaluable contents may be made restrictive by poor 
design. An Internet resource should, therefore, be assessed 
for its overall design, professional presentation of resource 
and consistency of design between different parts of the same 
resource. A proper navigation system can enhance the 
usability of the site. Features such as site map, index, menu 
system or search facility should, therefore, be considered as 
methods to enhance effectiveness and usefulness of an 
Internet resource. Navigation systems within a document, 
from document to document and outside the document may 
be noted. The evaluator of an Internet resource site may also 
wish to note the use of images and whether they have been 
used appropriately or whether they are merely decorative and 
add no value to the contents of a resource. If the site 
incorporates advertisements, assess if they distract or add 
to the value of information. Questions that an evaluator should 
be trying to answer in this step are: 


— Is the resource well-designed? 


376 - Manual of Digital Libraries 


— Is the information professionally presented? 

— ls the design consistent in different parts of the same 
resource? 

— Does the source contain finding aids, such as a site 
map, index, and menu system or search facility? 

— Arethe links between pages useful and are there any 
navigation aids available to guide users? 


— Are images used appropriately or are they merely 
decorative? 


— lsadvertising used appropriately or does it distract from 
the value of the information? 


4.11.7. Consider the Ease of Using the Resource 


Assess the Internet information resource for its ease of 
use. The information resource may also contain help 
information or user support service, FAQ or read-me file, e- 
mail address, telephone line, or availability of training course, 
discussion list or user support groups. Questions that an 
evaluator should be trying to answer in this step are: 


— ls the source easy to use? 

— Isthere any help information? Is it useful? Is it context- 
sensitive? 

— Are there any user support facilities? Are they useful/ 
responsive? 


4.11.8. Obtain any Additional Information 


Additional information about the quality of Internet 
information resources can be obtained from professional and 
academic journals. Inclusion of an Internet information 
resource in other meta resources is an indication of its quality. 
You can use “link” facility of Alta Vista to find links given to a 
resource. In the Alta Vista query box, type “link” followed by 

the URL of the site you are evaluating. Alta Vista will list all 
the sites it can find which are linked to the site being evaluated. 


Documents and Resources in Digital Libraries 377 


You can also search a bibliographic database to determine 
how extensively an author has published in a given area. 
Questions that an evaluator should be trying to answer in 
this step are: 


— If an individual or group has taken responsibility, are 
they qualified to provide this information? 


— Is the resource well known and/or heavily used? 


4.11.9. Compare the Resource to Other Similar Material 


Comparison of internet information resources on similar 
subjects/topics helps in estimating the value and usefulness 
of a particular resource. Special note may be made for 
anything unique that the site covers in term of its coverage or 
format. Questions that an evaluator should be trying to answer 
in this step are: 


— How does the source compare with others? 


— Does the source offer anything unique in terms of its 
coverage or format? 


—  Isthere a print or other equivalent to the resource? How 
do they compare? How do they compare in terms of 
the cost and value for money? 


— Is there a mirror site that is accessible faster? Is there 
any difference between the original site and the mirror 
site in terms of coverage? Is there a lag between 
updating the original site and the mirror site? Does the 
mirror site or original site provide any special features? 


4.12. ONLINE SEARCHING 


Online information retrieval or online searching is the 
acquisition of information from a distant computer via a 
terminal or PC, involving an interactive dialogue between the 
user and computer. The computer handles a number of 


378 Manual of Digital Libraries 


databases stored in electronic form, consisting of references 
to journal articles, conference papers, reports, books etc, 
which the Information Retrieval Service (IRS) or ‘host’ makes 
available to interested parties, such as university libraries, 
on a commercial basis. The computer matches any input 
search terms against its files and displays any resulting 
matches which can then be printed out or downloaded by the 
searcher. 


The choice of database depends on search topic. If, 
for example, the student is interested in the chemical aspects 
of a dielectric material, he might choose to access the online 
version of CAS - Chemical Abstracts Services at <http:// 
info.cas.org>. Chemical Abstracts or CA File is the most 
relevant English language database with world-wide coverage 
of literature in many areas of chemistry and chemical 
engineering. 


An online search consists of the following elements : 
— Formulating the search strategy, 
— Choosing a database and “host”, 
— Carrying out the search online, 
— Look atthe results and identify items of interest, 
— Refining the search when necessary, and 
— -Saving the results. 


4.12.1. Formulating the Search Strategy 


The development of an effective search strategy is 
essential if one hopes to obtain satisfactory results regardless 
of the search tool being used. A simplified, generic search 
strategy might consist of the following steps: 


— Formulation of the research question and its scope. 


— Identification of important concepts within the question. 


Documents and Resources in Digital Libraries 379 


— Identification of search terms to describe those 
concepts. 


— Consideration of synonyms and variations of those 
terms. 


— Preparation of the search logic. 


This strategy should be applied to a search of any 
electronic information tool, including library catalogues and 
CD-ROM databases. However, a well-planned search 
strategy is of especially great importance when the database 
under consideration is one as large and amorphous as the 
World Wide Web. The factor that underscores the need for 
effective Web search strategy is the fact that most search 
engines index every word of a document. This method of 
indexing tends to greatly increase the number of results 
retrieved, while decreasing the relevance of those results, 
because of the increased likelihood of words being found in 
an inappropriate context. When selecting a search engine, 
one factor to consider is whether it allows the searcher to 
specify which parts of the document to search, e.g., URL title, 
first heading, or whether it simply defaults to search the entire 
document. 


The most productive searched are those where the 
information seeker has spent time working out a search 
strategy before going online. The strategy is a pre-requisite 
for anyone attempting exhaustive searching, such as those 
embarking on a PhD, and recommended practice for any 
student wishing to conduct an efficient search and avoid 
frustration caused by low retrieval. In situations where connect 
time is charged for a search strategy is essential to prevent 
escalating costs. 

Then work out your specific information need and 


identify the different major concepts and alternatives. For 
example, the topic /norganic fertilizers divides into two main 


380 Manual of Digital Libraries 
concepts: 
— inorganic fertilizers 
— soil fertilization 
Put ideas on paper in natural language. 


Examine each concept to find as many synonyms and 
terms as you can think of, and group the related items together 
to provide the basis of a structure for searching: 


Inorganic fertilizers, Soil fertilization 
Soil fertilizers, Fertilizers producing factories 


Consider the levels - the amount of information you 
want, any limitations by date, language, etc., and add these 
qualifications to the structure. 


4.12.2. Developing the Search Strategy 


Boolean logic is the term used to describe certain logical 
operations that are used to combine search terms in many 
databases. The basic Boolean operators are represented by 
the words AND, OR and NOT. 


If you need to pose a more specific query, use the 
boolean operator AND, which limits results to those items 
that contain both or all of the search terms in your query. 
Again using the two words from the example above, the 
search query would retrieve only those items containing both 
words in the same item : 


Inorganic Fertilizers AND Soil fertilization 


This search query would return a much smaller set of 
hits, and the items would be more applicable to the field of 
inorganic fertilizers. To demonstrate the difference between 
the OR and the AND operator, we can run the two searches 
above using Internet. The search query Inorganic fertilizers 
OR Soil fertilization returned over 30,000 items, while the 


Documents and Resources in Digital Libraries 381 


query Inorganic fertilizers AND Soil fertilization returned 175 
items. 


The OR operator is useful for the first phases of a 
search, when you are not exactly sure what information is 
available on your topic or what words are used to categorize 
it. When used between two words, the OR operator instructs 
the search tool to retrieve any record containing either of the 
words. For instance, the search query would retrieve items 
containing either the word “fertilizers” or the term “fertilization”: 


Inorganic fertilizers OR Soil fertilization 


Once you view the types of items containing either word, 
you might want to narrow your search by dropping one term 
and confining your search to the other. For instance, you might 
find that the records indexed under the term “fertilizers” are 
more relevant to your research question than those indexed 
under “fertilization”. Or, as in the example, you might find 
that the items related to the specific field of “soil fertilization” 
must contain both words, not simply either one. Because OR 
is the boolean operator that returns the most “hits”, search 
queries containing OR are very broad and sometimes return 
items that are not relevant. 


The last of the three most common boolean operators 
is the word NOT. The NOT operator is used to eliminate 
records containing a particular word or combination of words 
from your search results. For instance, if you are performing 
a general search on soil fertilization, you might wish to exclude 
items dealing with the very specific discipline of “fertilizers 
production”. To make this exclusion, you could construct your 
search query as: 

Fertilizers NOT Organic 


This search would return all items containing the word 
“fertilizers” except for those that also contain the word 
“organic. 


382 Manual of Digital Libraries 


Berides when you visit a search site, always read the 
instructions or help file before beginning your search. Each 
search engine has different parameters for using upper- and 
ower-case letters and combining boolean operators. Another 
good method for refining your search is to run a few searches 
experimentally to see what results are returned. By browsing 
through your results list, you can determine whether or not 
your strategy is returning relevant items. Then, you can 
construct a search strategy using the boolean operators OR, 
AND, and NOT to improve your results. 


lf you are unsure which database to choose, help is at 
hand online. Some major hosts provide the facility for 
comparison of the number of occurrences of input search 
terms within each database they hold. However, it is advisable 
to ascertain names of the major databases in your area before 
you commit yourself to accessing a particular host which may 
not provide those particular databases. 

You can access a particular host computer via your 
terminal/PC by means of national and international data 
networks and once linked to your host you can call up the 
database you require. 


For example STN - the Scientific & Technical Network 
(shttp://www.stn-inteinational.de/>) carries many databases 
in the fields of chemistry and chemical engineering. STN is 
widely used in European universities, many of whom make 
use of the special academic password which can be supplied 
to educational institutions. 


4.12.3. Carrying out the Search Online 


The instruction in searching that follows is generally 
applicable to all databases and host services. However, 
software commands are still host specific and databases 
remain un-standardized in their structure and layout. While 
there has been partial adoption of similar codes by different 


Documents and Resources in Digital Libraries 383 


hosts following the introduction of the Common Command 
Language, enabling you to switch more easily from one host 
to another, and a corresponding simplification in the number 
of codes used within a database, the diversity remains. 
detailed descriptions of any one host service’s commands 
have not been included within the body of the text. 


Key in your search terms with the appropriate 
commands. A typical search consisting of a select command 
statement and system response of numbers of records 
retrieved, might look like this: 


=> S DISTRICT AND HEATING 
536 DISTRICT 
814 HEATING 

L1 28 DISTRICT AND HEATING 


4.12.4. Looking at Results and Indentify Items of Interest 


You can then display the records, with reference to 
record formats for display commands and record format 
examples. Examine a sample of the records for relevance. 
You can choose whether to view the whole of a record or part 
of it; displaying 10 items in title-only format gives an indication 
of the relevance of the ‘hits’. 


4.12.5. Refining the Search, when Neccessory 


If the titles are not what you would expect, think again 
about the terms you have used and modify your search 
accordingly. A closer look at the indexing terms may indicate 
that your search could be improved by the addition of some 
of those terms, either to more precisely focus on the search 
topic, or to extend it. Selecting from displayed references a 
term that aptly describes your topic, should retrieve all those 
records that otherwise did not match your initial search terms. 


384 Manual of Digital Libraries 


Then refine your search by combining such new search 
terms with previous sets where appropriate, or create a whole 
new sequence of sets. Repeat the procedure until you fee! 
that your search string is satisfactory. If your records are 
relevant but too numerous you may be able to reduce the 
total by limiting by timespan, language, document type or 
restricting to basic index fields, depending on the host service. 
Limits can be applied before, during, or after you have used 
the search command. 


4.12.6. Saving the Results 


You can capture selected records using Print/Save 
options. Besides, end-user services are also important in 
virtual or online access. End-user services in the virtual library 
environment covers three main elements: 


— End-user access to online tools and electronic full-text, 
— End-user search training, and 
— The facility for direct user requesting of materials. 


Libraries are tending to offer end-user access fo 
materials because users have the expectation of access to 
materials, in full-text, and on their desktop. Direct requesting 
is usually applied in the form of loan and inter-library loan 
functions, often using email and/or the Internet. In terms of 
tools, place and time, users who are able to access resources 
themselves are advantaged over those who cannot. In each 
case, users are empowered, while library staff can devote 
time that used to be spent on processing to other matters. 


Again, there are several difficulties inherent in end-user 
services. There are policy implications for what people can 
and cannot have access to, circulation policies, loan periods, 
borrowing limits, and other related matters. Likewise, libraries 
that participate in such schemes need to show commitment 
to consortial users, not just local users. There is also a danger 


Documents and Resources in Digital Libraries 385 


of wasteful and/or malicious requesting. User support and 
training must also be increased in the electronic environment, 
so that clients can effectively use the services that have been 
put in place. 


4.13. SOME NEW RESOURCES 


As the technology is advancing, some new means of 
information generation and dissemination are coming into 
existance. These include Wiki and bloggings. These are briefly 
introduced here. 


4.13.1. Wiki 


The term wiki is a shortened form of wiki wiki, which is 
from the native language of Hawaii (Hawaiian), and is 
commonly used as an adjective to denote something quick 
or fast. The term w/k/ can also refer to the collaborative 
software itself — wiki engine, that facilitates the operation of 
such a website. 


The first wiki, WikiWikiWeb, is named after the Wiki 
Wiki line of Chance RT-52 buses in Honolulu International 
Airport, Hawaii. It was created in 1994 and installed on the 
Web in 1995 by Ward Cunningham, who also created the 
Portland Pattern Repository. A wiki is a type of website that 
allows anyone visiting the site to add, remove, or otherwise 
edit all content, quickly and easily, often without the need for 
registration. This ease of interaction and operation makes a 
wiki an effective tool for collaborative writing. 


In essence, a wiki is nothing more than a simplified 
system of creating HTML web pages combined with a system 
that records and catalogues all revisions so that at any time 
an entry can be reverted to a previous state. A wiki system 
may also include various tools designed to provide users with 
an easy way to monitor the constantly changing state of the 
wiki as well as a place to discuss and resolve the many 


386 Manual of Digital Libraries 


inevitable issues, most related to the inherent disagreement 
over wiki content. Wiki content can also be misleading, as 
users are bound to add incorrect information to the wiki page. 


Some wikis will allow completely unrestricted access 
so that people are able to contribute to the site without 
necessarily having to undergo a process of registration, as 
had usually been required by various other types of interactive 
websites, such as Internet forums or chat sites. 


Contributors : Wikipedian’s are volunteers, including a 
core group of about 2,000, and you know what they say about 
volunteers. Managing them is like herding cats. But, like cats, 
these volunteers manage themselves pretty well, a feat that 
seems next to dumbfounding. An international nonprofit, the 
Wikimedia Foundation, manages the infrastructure and pays 
the bills, but it doesn’t run the endeavor in a top-down fashion. 


What characterizes these volunteers? For sure they 
have online access. They are skilled in using wikis, which 
implies a certain level of both intelligence and geekiness. 
Wikipedia’s contributors are people with time on their hands, 
for sustained participation takes time. 


Why do they contribute? In today’s busy world with time 
at such a premium and most of us overworked, who would 
take the time from their busy schedule on a regular basis to 
do careful research and meticulous writing? Articles are not 
Signed, so it can not be for the glory, although Wikipedia leader 
Jimmy Wales says that recognition within the community, 
where you do get known, serves as a powerful motivator for 
some. Some contributors may harbor personal or 
organizational agendas, but with a bunch of picky people 
overseeing their contributions, expression of those agendas 
in articles is not likely to last long. 


Surveys of open-source project participants have found 
that some sort of public interest or community spirit is often 


Documents and Resources in Digital Libraries 387 


part of the motive. These enterprises offer an opportunity to 
contribute to something that has lasting value and will continue 
to grow. Open-source publishing allows writers and software 
developers to apply their skills outside a strictly business 
environment. Casual writers and editors sometimes 
participate as a hobby or learning experience. 


Britannica’s contributors are chosen for their 
professional expertize: As the company’s literature says, they 
are “Nobel Prize winners, authors, curators, and other 
experts.” Another blurb says, “Most are authors, university 
professors, commentators, museum curators, scientists, and 
other experts chosen for their field expertize.” These writers 
get paid for their work on the encyclopedia and they get 
bylines. Tom Panelas, director of corporate communications 
at Britannica, says : 

Essentially we look for the best expert on every subject 
and try to commission an article from him or her. We have 
had good luck most of the time. Our contributors have included 
Einstein, Freud, Marie Curie, and more than 100 Nobel 
laureates, including many that write for us today, such as 
Milton Friedman. Top historians such as Joseph Ellis and 
Robert Dallek are among our contributors today. We go about 
selecting these people through a number of means. Our 
editors are knowledgeable in the subjects they cover, and 
we also have many outside scholars and experts advising 
us, such as our editorial board, which itself has several Nobel 
Prize winners and university presidents. These people 
oversee our staff editors, give them guidance, and suggest 
contributors and other advisors to us. We have about 4,800 
contributors worldwide. 


Asked whether any Britannica contributors write for 
Wikipedia, Panelas says, “Not that we know of.Our 
contributors tend to be busy and serious people who expect 
to be paid for their work. They also want their handiwork 


388 Manual of Digital Libraries 


respected and taken seriously, and few would want to submit 
something that would be subject to the whims of someone 
who knows little or nothing about the subject.” 


Who exactly are the users of both Britannica and 
Wikipedia? 


Britannica’s Panelas says, “Our customers tend to be 
knowledge and information seekers, a broad group consisting 
of students, professionals, and lifelong learners. They tend 
to be better educated than the population as a whole, or they 
aspire to be. Beyond that they share few demographic 
characteristics.” 


Wikipedia’s users are potentially everyone under the 
sun. Because it has versions in about 200 languages, its reach 
is potentially far greater than that of Britannica. Britannica 
offers only an English-language version, although the 
company does produce other works in other languages. 


So not only do the characteristics of Wikipedia’s and 
Britannica’s contributors differ, so do their audiences. 
Wikipedia’s audience is far more general than that of 
Britannica, which implies that its mission and scope must be 
SO as well. 


Scope : Wikipedia’s guidelines also say that subjects 
of articles should be notable. The community pages explain 
that what constitutes notability is always under debate: “Few 
of us believe that there should be articles about every person 
on Earth, every company that sells anything, or each street 
in every town in the world.” When asked about that criterion, 
Wales glosses over it and says that the information needs 
verifiability. “Notability is actually a very controversial 
requirement within the community simply because it’s so 

subjective. What’s notable enough? So what we prefer to do 
is more or less shy away from notability, just because it ends 
up being a pretty unproductive discussion and focuses a lot 


l 


Documents and Resources in Digital Libraries 389 


more on things like verifiability: whether or not the information 
can be verified. That’s a much easier thing to decide rather 
than ‘Is it important enough?’ That’s a very tough argument 
to have.” He concedes that determining whether something 
is verifiable entails a complex process, but essentially, it 
means attribution to a reputable source. 


When asked to compare Britannica's scope with that 
of Wikipedia, Panelas says, “We can’t cover as many things 
as they do, but we wouldn't even try to. What they do is very 
different from what we do. We don’t have an article on extreme 
ironing, and we shouldn't. Wikipedia does what it does, and 
their strengths come at a cost. The cost of piling up large 
numbers of articles is a high level of inaccuracy, sloppiness, 
and just plain poor articles. For some people it’s a price worth 
paying, and that’s fine. There’s room in the world for many 
sources of information with different virtues and 
shortcomings.” 


Wikipedia Process : Wikipedia exemplifies a fascinating 
new paradigm. It is open to everyone, not only to read but 
also to create and maintain, and governed primarily by 
community consensus. This model is so disruptive that it’s 
worth examining in some detail. 


Anyone can edit a Wikipedia article. Until recently, when 
a brouhaha erupted over alleged character assassination in 
an article about John Seigenthaler, an associate of Robert F. 
Kennedy, anyone could initiate an article. The Seigenthaler 
article’s author, who was identified shortly after the story 
broke, said he was only joking. Now you must be a registered 
user to offer an article, but, of course, anyone can register. 
The logic behind the change is that forcing people to register 
will slow down the creation of new pages and allow quality 
checkers to keep up. According to Jimmy Wales, quoted in 
Business Week on December 14, 2005, “We are preventing 
unregistered users from creating new pages because so often 


390 Manual of Digital Libraries 
those have to be deleted.” 


Articles are not signed, but every change is linked to 
some kind of identifier, either a user name or an IP address. 
A history page for each article shows the text of every change 
and the identifier of the person who made the change. You 
can see all changes made by an individual, compare versions 
by hitting a button labeled Compare Selected Versions, and 
see at a glance whether previous versions include major or 
minor edits. These abilities allow users and non-users alike 
to spot trends and, potentially, agendas. Users who abuse 
the system are blocked. 


All changes are tracked. As new changes come in, the 
changes go onto a list for easy spotting. This practice is 
supposed to help the community keep an eye on everything 
and exercise quality control. Sometimes it fails, largely due 
to the volume of edits. Sometimes the problem is that an article 
is not well linked to anything else. That is how the false 
Seigenthaler article managed to stay intact for 123 days 
before discovery. 


Why not sign articles? Since no one owns any part of 
any article, if you create or edit an article, you should not sign 
it. On the other hand, when adding comments, questions, or 
votes to back-end pages, it is good to own your text. So the 
best practice is to sign it. 


The idea behind Wikipedia is that it is self-cleaning. If 
someone posts an article or change that includes an error, 
the community will find the error and fix it. This approach 
resembles that of the open-source software community, 
where code is open and available to all, and where thousands 
of eyes are more likely to spot problems than just a few. 
Wikipedia is a bit different from open-source software, though, 
as Jimmy Wales points out. With open-source software, a 

final version emerges as the official issue, at least for that 
release. Wikipedia is never locked for good; there is never 


Documents and Resources in Digital Libraries 391 
an Official version of an article. 


Wikipedia requires that participants take neutral stances 
and write without bias, which is not always easy to do. 
“Wikipedia represents a belief in the supremacy of reason 
and the goodness of others.” Yes, people will clash, but 
respectfully, and out of their conflict, something like the truth 
will emerge. Whether the system works depends upon several 
things happening: (i) someone who knows what they are doing 
actually finding the error; (ii) noble, nonpartisan intentions; 
(iii) members practicing the philosophy “If it ain’t broke, don’t 
fix it’; and (iv) the existence of a community familiar with the 
rules and respectful of its members, except for trolls and 
vandals. 


Community is key in Wikipedia. Anyone can participate, 
but a relatively small core community does most of the work. 
There are written community standards, like intolerance for 
bad behaviour — vandalism, trolling, personal attacks; 
encouragement of a friendly, helpful, thoughtful environment; 
and writing from a neutral point of view. As Wales puts it, 
“The wiki process, in and of itself, is something of a mutually- 
assured-destruction type of process. In other words, if you 
write something that is biased, it will just be deleted. And so 
everybody who participates has an incentive to try to write 
for the enemy, as we put it, or write for people who may not 
agree with you, and try to phrase things in a way that is as 
neutral as you possibly can because that is the only way to 
write something that will survive the test of time.” 


Authority : As it is difficult to hit a moving target, so is it 
difficult to evaluate Wikipedia’s authority. One minute an 
article may be flawed; another, it may be capable of satisfying 
most experts, Users who rely on Wikipedia as a sole source 
are playing roulette, even if they check and recheck entries. 


Nancy O’Neill, Principal Librarian for reference services 
at the Santa Monica Public Library System, says that there is 


392 Manual of Digital Libraries 


a good deal of skepticism about Wikipedia in the library 
community. She also admits cheerfully that Wikipedia makes 
a good starting place for a search. You get terminology, 
names, and a feel for the subject. Wales says, “I guess the 
main thing is people need to understand that Wikipedia is 
very much a work in progress. That it is in many places very 
high quality, but because it is an open-ended work in progress, 
there can be mistakes and errors that have not been caught 
yet. | would treat it as an excellent starting point to get some 
basic background information before doing further research.” 
But as Peter Morville, an expert in information architecture, 
reminds us — “How Findability Determines Authority Online: 
The Wikipedia Phenomenon,” “Authority derives from the 
information architecture, visual design, governance, and 
brand of the Wikipedia, and from widespread faith in 
intellectual honesty and the power of collective intelligence.” 
He feels that Wikipedia does a great job in these areas and 
that it beats Britannica because, in the spirit of Google, it is 
“more findable”; that its “multi-algorithmic,” Google-derived 
approach, which includes full-text searching, internal link 
structures, metadata, and free tagging, is the point. 


This is interesting stuff. Today’s developers and avid 
web users are thinking in ways that are as different to some 
of us as Western and Eastern cultures are to each other. 
Morville indicts the authority of traditional sources as much 
as that of Wikipedia: “Even the revered Encyclopaedia 
Britannica\s riddled with errors, not to mention the subtle yet 
pervasive biases of individual subjectivity and corporate 
correctness.” And therein lies the rub — there is no one perfect- 
way. Britannica seems to claim that there is. Wikipedia 
acknowledges there is no such thing. 


Librarians and information professionals have always 
known this. That is why we always consult multiple sources 
and counsel our users to do the same. If we adhere to that 
practice, what are we worrying about? 


Documents and Resources in Digital Libraries 393 


Wikipedia embodies a collaboration frenzy as hot as 
tech start-ups in 1999, but let’s not forget that there are two 
schools of thought on collaboration. One says the more minds, 
the more refinement, nuance, and innovation achievable. The 
other quotes the old saw, “A camel is a horse designed by a 
committee.” The problem with both approaches is that the 
search for truth is an ongoing process. An encyclopedia entry 
can be accurate as far as it goes but is rarely complete. It 
may represent a temporary consensus, where “temporary” 
could mean a few minutes or a few decades. 


It is believed Wikipedia is self-cleaning and evolving 
and that Wales announced that eventually Wikipedia will 
consist of a stable version of pages vetted for accuracy before 
being seen by the public. As far as accountability is concerned, 
let's set some consistent standards and stop worrying about 
ridiculous lawsuits like the class action suit some nut job is 
attempting to put together. Every source has errors that 
propagate every time someone reads, hears, or watches 
them. 


4.12.2. Blogs and Weblogs 


Blogs are the sites that capture particular views, ideas, 
or opinions overtime. These are the web applications, which 
contains periodic posts on a common web page. These posts 
are often but not necessarily in reverse chronological order. 
Each blog tells a story, be it about a person, an organization, 
an event or other subjects such as the environment, 
healthcare, disasters, languages or literature etc. In a blog, 
entries are read, commented on and discussed by even larger 
community, after fostering active debate. This may sound 
similar to a listserv and in some ways, it is. However unlike a 
listserv, blogs are typically accessible to the public at large, 
thereby encouraging more diverse relationship and 
disseminating information further and with greater speed. 


394 Manual of Digital Libraries 


Blogs have become so ubiquitous that for many people 
the term is synonymous with “personal website”— though 
many commercial sites now incorporate one. For others, they 
are sites made with blogging software, which seems obvious— 
except that a few of us still update our sites by hand. But the 
form is familiar— frequently updated, reverse-chronological 
entries on a single web page. In 1999, there were not yet 
tools designed specifically for creating weblogs. Some 
programmers created or adapted software to maintain their 
blogs. The rest of us hand-coded our sites. HTML is simple 
enough for any motivated amateur to learn, so the bar was 
not very high. 


Robot Wisdom, coined the term “weblog” in 1997, he 
defined it as “a web page where a weblogger ‘logs’ all the 
other web pages she finds interesting.” Weblogs were distinct 
in both form and content from the web journals that had 
preceded them. At that time, journals were personal accounts 
chunked into individual pages— one entry per page, one page 
per day, as if a paper diary had been transplanted to the Web. 
By contrast, weblog entries were short, usually contained links 
to the larger Web, and appeared all together on one long 
page. Many were updated throughout the day. 


Weblogs were also distinct from e-zines. E-zines were 
published on a schedule, like paper periodicals, and contained 
longer original articles and artwork. They required planning, 
organization, and a certain level of skill in layout, typography, 
and the other elements of web design. By contrast, weblogs 
were rudimentary in design and content. Indeed, many 
zinesters disdained the new form, opining that the Web would 
soon be filled with pages of links, all pointing to one another— 

with no original consent anywhere. 


In late 1999, several companies released software 
designed to automate weblog publication. One of these 
products was called Blogger, and the press could not get 


Documents and Resources in Digital Libraries 395 


enough of it. For journalists, Blogger epitomized the dot-com 
era— Founders Meg Hourihan and Evan Williams were in their 
20s; their free, wildly popular product had no discernible 
business plan; and their tagline, “Pushbutton publishing for 
the people,” promised to revolutionize the Web. Blogger really 
was easy to use. When news stories began defining weblogs 
as “websites made with Blogger,” it quickly became the most 
widely used blogging tool. And that changed weblogs. It was 
an interface decision that did this. Consider Pitas, another 
early weblog updater, which provided users with two simple 
form boxes— one for a URL and one for the writer's remarks. 
Hitting the Post button generated a link followed by 
commentary. 


Blogger was simpler still, consisting of a single form 
box field into which bloggers typed whatever they wanted. It 
is sometimes wondered whether the new bloggers knew 
enough HTML to construct a link. Whether they did or not, 
Blogger was so simple that many of them began posting 
linkless entries about whatever came to mind. Users who kept 
Blogger open all day may have found searching the Web for 
links to be something of a nuisance. It was much easier to 
reference friends’ sites, or omit the link altogether. 


So, with the overwhelming adoption of Blogger, and 
without an interface that emphasized links as the central 
element of the form, the blog-style weblog was born. In the 
original weblog community, much controversy ensued. These 
are diaries, not weblogs! Weblogs are about links! 


Evan Williams has said that he understood early that 
weblogs are about the format, not the content. But he did not 
understand something about the filter-style weblog and the 
aims of the community that invented it. At least some of us 
thought that through the careful selection and juxtaposition 
of links, weblogs could become an important new form of 
alternate media, bringing together information from many 


396 Manual of Digital Libraries 


sources, revealing media bias, and perhaps influencing 
opinion on a wide scale—a vision called participatory media. 


Blogger introduced an innovation that would forever 
change the face of weblogs—the permalink in 2000. From the 
start, webloggers had frequently referenced other blogs. It 
was awkward but this cross-blog talk was so compelling it 
became a primary focus of entire weblog clusters. Permalinks 
gave each blog entry a permanent location at which it could 
be referenced—a distinct URL. Previously, weblog archives 
had been navigable only through browsing. Now, bloggers 
could reference specific weblog entries as elegantly as they 
referenced any online source. The feature was so useful that 
it became a canonical component of the standard weblog 
entry. Ina medium whose currency is links, weblogs without 
permalinks were at a sudden disadvantage. Hand-coders had 
to invent ways to reproduce this feature if they wanted to be 
referenced on other blogs. 


To some extent, the permalink also elevated weblog 
commentary to a legitimate form of discourse. A link is, after 
all, a link. Whether it leads to a weblog entry or a syndicated 
column, each link on a page has equal weight. If the nature 
of weblogs is to democratize publishing, perhaps the nature 
of hypertext is to equalize influence, at least within the context 
of the page. 


Cross-blog talk inspired development of another 
innovation— comments. For those whose software did not 
provide this capability, enthusiastic hackers, coding for fun, 
created remote commenting systems. Invariably, these early 
commenting systems—hosted, perhaps, in somebody's 
basement—would quickly bog down, slowing loading times to 
a crawl. Bloggers would change services or abandon 
comments altogether. But the lure of public conversation is 

so strong that as early as 2001 Blogger was the only major 
blogging tool without commenting capability. For many, 


Documents and Resources in Digital Libraries 397 


weblogs are unthinkable without comments and the 
community of readers that comments make visible. Indeed, 
some have criticized comment-free weblogs as merely an 
inferior form of broadcast media. Commenting has meant a 
further democratization of publishing, creating an even lower 
bar for readers to become writers. 


Trackback, introduced by Movable Type in 2001, 
automated cross-blog talk itself. Trackback allows a blogger 
to ping another weblog, placing a reciprocal link—a trackback— 
in the entry he has just referenced. Previously, bloggers 
scoured referrer logs to discover references to their sites. 
Trackback has made these formerly invisible connections 
visible, inviting instant response. Trackbacks, often 
interspersed among site comments, emphasize the 
conversational nature of the weblog form while collating for 
readers all available responses to an entry. Like permalinks 
and comments, trackback has raised the bar for software 
vendors and hand-coders alike. 


This repeated pattern—development of free tools in 
response to widespread practice—continues to shape weblogs 
and blogging. Services now automate everything from site 
syndication to displaying reading lists. Websites rank the mest 
popular weblogs and list recently updated blogs. When any 
sizable number of bloggers start doing something, someone, 
it seems, will construct a tool to automate it—further 
popularizing the activity. 


It is estimated that by the end of 2004 blogs had 
established themselves as a key part of online culture. The 
findings of two surveys by the Pew Internet and American 
Life Project in November established new contours for the 
blogosphere and its popularity: 

e Seven percent of the 120,000,000 US adults who use 


the Internet say they have created a blog or web-based 
diary. That represents more than 8,000,000 people. 


398 Manual of Digital Libraries 


e Twenty-seven percent of Internet users say they read 
blogs, a 58 percent jump from the 17 percent who told 
us they were blog readers in February 2004. This means 
that by the end of 2004, 32,000,000 Americans were 
blog readers. Much of the attention to blogs focused 
on those that covered the 2004 political campaign and 
the media. And at least some of the overall growth in 
blog readership is attributable to political blogs. Some 
9 percent of Internet users said they read political blogs 
“frequently” or “sometimes” during the campaign. 


e Five percent of Internet users say they use RSS 
aggregators or XML readers to get the news and other 
information delivered from blogs and content-rich 
websites as it is posted online. 


e The interactive features of many blogs are also catching 
on— 12 percent of Internet users have posted comments 
or other material on blogs. 


e At the same time, for all the excitement about blogs 
and the media coverage of them, blogs have not yet 
become recognized by a majority of Internet users. Only 
38 percent of all Internet users know what a blog is. 
The rest are not sure what the term “blog” means. 


Blog creators are more likely to be — Men — 57 percent 
are male; Young — 48 percent are under age 30; Broadband 
users — 70 percent have broadband at home; Internet 
veterans — 82 percent have been online for six years or more; 
Relatively well off financially — 42 percent live in households 
earning over $50,000; Well educated — 39 percent have 
college or graduate degrees. 


But inspite of above popularity Blogs still are not that 
well known. Blogs find their applications in Library and 
Information Centres. Libraries are in the position to be leaders 
in utilizing weblogs for communication purposes. The 


Documents and Resources in Digital Libraries 399 


possibilities are infinite and the projects are easy to set up. A 
few ways in which libraries can use weblog technology to 
enhance services and communicate with each other are 
described below. This list is by no means comprehensive. 
The power of weblog technology can be implanted into almost 
any aspect of a library-web page. 


Blog as a Library Newsletter : The necessity of keeping 
library patrons aware of services and resources has not 
diminished. In fact it is even more urgent as the explosion of 
online content and services has made it more difficult for 
patrons to keep up with new resources and changes to those. 
Many libraries have “What’s New” sections on their Web sites 
to publicize new resources, services, or events, but Blogs 
can serve as an alternative to static print newsletters for 
keeping patrons informed about library services and 
resources. A librarian can post information quickly, easily and 
immediately. Implementing library blog instead of a print 
format newsletter has shifted the focus from time-consuming 
layout and production issues to rapid dissemination of 
relevant, quality news and information. Library blog can be 
updated quickly, easily, and as frequently as needed, while 
patrons can read it whenever they choose, blogs can support 
the goals of the library while simultaneously meeting the 
specific objectives of librarians. At the same time librarians 
can use blogs along with other tools like e-mail to keep patrons 
continually aware of the services and resources available to 
assist them in meeting their own research and educational 
objectives. 


Blog as a Reference Desk : Some reference desks 
have a notebook that keeps the librarians current with what 
has been happening in the library. Some examples of this 
current awareness incluge new homework assignments, 
Frequently Asked Reference Questions. A weblog could 
enhance this service with not only pertinent information, but 
links to web sites for assistance in the homework 


400 Manual of Digital Libraries 


assignments, quick answers to the FARQs. If the entire 
reference staff had access to the weblog, they could 
communicate with each other rather easily. Weblogs can also 
be used for discussion among the library staff. In large 
buildings where it is impossible to communicate with everyone 
at the same time, the minutes of meetings, handouts from 
training sessions, or even links to articles for professional 
development purposes. The library should consider putting 
these types of weblogs behind an Intranet if they do not want 
it published on the Internet for anyone to gain access. 


Blog and New Acquisitions : The library might consider 
utilizing weblog technology to announce new acquisitions to 
the collection. There can be multiple weblogs for different 
types of media or, one weblog can be used for the entire 
collection and each post could be routed to specific categories 
and the user could choose which ones to read. 


Syndication can be used to display new acquisitions. 
As soon as a book or movie is entered as available, it could 
query the weblog, and the new material could be posted to 
the weblog within minutes. 


Blog and Libraries Consortia : Libraries that belong to 
consortia can utilize weblogs to communicate between 
libraries, set up meetings, post new resources added to 
shared documents; discuss problems with these same 
resources, report downtime, and discuss possible 
acquisitions. This of course, does not have to be done on the 
county level. Statewide library systems can easily 
communicate information to each other using weblog 
technology. If management notices a lack of communication 
and network among any set of libraries, weblogs may be the 
key to jumpstart these relationships. By having all librarians 
participating in the weblog experience, morale may be 
boosted, thus improving work product. 


Blog as a Book Clubs : Librarians can use weblogs to 


Documents and Resources in Digital Libraries 401 


not only instantly publish their information to a weblog, but 
encourage members of a library book-club weblog group to 
post comments or questions about the reading material. New 
books for discussion can be posted or readers, who have 
busy schedules, can participate virtually on their own time by 
posting to the weblog. Librarians who lead book discussion 
groups can be the administrator for these weblogs and can 
have complete control over. 

Blog as a Marketing Tools in Libraries : Librarians have 
had to learn how to do a lot with just a little in order to promote 
awareness of their programs and services. They have seized 
the opportunities to market libraries in the real world via 
traditional media— newspapers, corporate newsletters, radio, 
and TV. Many libraries produce brochures, pathfinders, and 
their own newsletters. So it is no surprise to see librarians 
stepping up to the plate and spreading the word online with 
blogs. Savvy librarians have identified blogs as another means 
to market libraries and their services. Regardless of the type 
of blogging option a library selects, marketing it is essential. 
The real challenges in making library blogs will involve 
marketing and maintenance: getting patrons to visit, and 
offering the valuable content that will bring them back. 


So, weblogs are an excellent way to stay current. News 
travels down the blogging pipelines long before it appears in 
print and, in many cases, online magazines and journals. 
Librarians are great filters of information and relying on a 
select group to provide your daily information can be a great 
time-saver. However, ASIST Professional Guidelines state 
that information professionals should seek “to extend public 
awareness and appreciation of information availability.” As 
such, librarians and information professional should not only 
provide information on demand and act as intermediaries 
between the users and the information, but should alert to 
the existence of novel “relevant” information, and provide 
access and facilitate users and fellow professionals to 


402 Manual of Digital Libraries 


efficiently utilize resources, technologies and information 
retrieval tools. An additional responsibility of the information 
professional is to “uphold each user’s, provider’s, or 
employer's right to privacy and confidentiality and to respect 
whatever proprietary rights belong to them” . With the 
increased complexity of the application of the principles of 
fair use, copyright, privacy and intellectual property in the 
electronic world, the information professional must be 
constantly aware of the developments in these areas. 
Weblogs are ideal for disseminating all types of information 
chosen by the blogger, for commenting, expressing opinions 
and for discussing implications. 


Besides, Librarians working in public service, 
information, and reference, can encourage students working 
in groups to create blogs, to be used as a project management 
tools. For example, at the University of Alberta, students in 
the fourth-year mechanical engineering design course must 
work in groups of four on their capstone projects. The 
mechanical engineering librarian was approached by students 
interested in ways of managing the amount of information 
being shared between group members. Was there a way to 
deal with multiple e-mails, phone calls, paper trails? With basic 
guidance and assistance from the mechanical engineering 
librarian, a number of these groups are using blogs as a 
central online location for information sharing, gathering, and 
comments. Minutes of their meetings, allowing for future 
reference to past decisions, are posted, as are links to sites 
of interest, including patents, and design examples. 

But the librarians who choose to encourage students 

to set up their own blogs need to be familiar with this software, 
or others of a similar nature. 


4.12.3. Face Books 
Facebook is referred as “trends of our users that we 
just can not ignore.” And as such Facebook has emerged as 


Documents and Resources in Digital Libraries 403 


a significant and useful method for professors, librarians, and 
campus administrators to reach students before during, and 
after their collegiate experience in the last few years. 
Facebook just makes it a little bit more intimate in your 
friendships with people. 

Many people that use Facebook a lot, but it is felt like 
they use it after they meet someone. If they meet them once 
they keep in contact after, even through they only knew them 
a little bit. But it is not known that the people make friends 
through Facebook, but maybe they make better friends 
through Facebook. Honestly, it is really pathetic to know that 
what the people were doing before, but it is just so easy to 
access information about people on Facebook. Instead of 
question like, “do anyone has this person’s phone number?” 
or “oh God, where do they live, they live in this dorm but we 
need the room number”, which were time consuming and 
tough to get answer, but they are now so easy to find on 
Facebook. 

Facebook is very good for networking ... itis guaranteed 
every single person in a school or college will make an effort 
to maintain those facebook friendships and so that when we 
were in our fourties, go back to our reunion, and we will still 
be able to get in touch with each person we know. You know, 
“so and so is a doctor or so is a business man now”. And we 
would not hesitate to call on them for a favour, just because 
we went to the some high school. Thus, facebook may help 
in resurrecting past friendships. 

A Facebook has many characteristics but important of 
them may be described as under: 

Facebook is a primarily—and otten exclusively— 
recreational space, and is heavily used by students for social 
purposes : Online observations reveal that students use 
Facebook primarily as a resource for managing existing face- 
to-face relationships, no matter how tenuous, but rarely use 
Facebook to initiate new relationships without at least some 


404 Manual of Digital Libraries 


prior offline basis of interaction. 


Students’ “academic” uses of Facebook are still 
inherently social : Majority of studends are using Facebook 
to communicate about course assignments, however this 
communication is primarily logistical and concerns such 
matters as missed lecture notes, paper due dates, and 
assignment guidelines. Although some discussed academic 
interests on Facebook, yet this is an inappropriate place for 
serious academic discussion. 


Some students are using Facebook to arrange face- 
to-face study sessions with friends and classmates. 
Facebook's most substantial contribution to academics, 
however, occurs when students take “Facebook breaks” as 
a reward for studying. Interestingly, a little claimed that 
Facebook negatively impacted of their academic work. 


Students generally perceive the presence of non- 
peers—especially authority figures— on Facebook as an 
intrusion : Survey and other data indicate studeni uncertainty 
regarding the presence of authority figures in Facebook. 
Students expressed ambivalence, but tended to be more 
hesitant than enthusiastic about librarians’ involvement with 
Facebook. Some students had described the prospect of 
student-librarian Facebook interactions as “weird” or 
“awkward.” 

The potential benefits of a facebook may be listed as 
below : 


— Already integrated into students’ daily practices in many 
advanced countries. 


— Higher level of engagement. 


— Potential to make identity information more salient 
during class discussion. 


— Adds “social” peer to peer components. 
— Improve digital literacy skills. 


Documents and Resources in Digital Libraries 405 


Facebook is a privately held corporation and thus 
unaccountable to higher education institutions, but Social 
Networking Sites (SNSs) can have positive outcomes for 
students and the community: 


— For social capital findings, 
— For improving digital literacy skills, 
— For improving critical professional skills. 


Libraries in particular have been eager to capitalize on 
Facebook’s potential for building and maintaining relationships 
with students. Libraries have created search bar applications 
(e.g., for JSTOR), communication tools, and more 
comprehensive virtual library service applications. Research 
on Facebook use in libraries has encouraged librarians and 
other library staff members to create their own profiles and 
join Facebook groups as a way of sharing information 
amongst colleagues and with patrons”. “Friending” students 
from instruction sessions and customizing profiles to highlight 
library resources are other recommended methods of library 
Facebook usage. At one institution, a librarian created his 
own one-man Facebook campaign, sending direct messages 
about the library to more than 1,500 students through 
Facebook. The response rate to this campaign was low, but 
the responses—both online and in-person—provided 
opportunities for meaningful interactions between the librarian 
and patrons. 


While the library literature acknowledges the social 
nature of Facebook, in practice libraries seem to assume that 
students will be open to developing relationships with 
librarians through Facebook based on personal interactions 
and the utility of the library resources now available through 
Facebook. This assumption that students will perceive and 
interact with the online presence of an institution like the library 
just as they do peers is problematic and deserves scrutiny. 
By framing Internet-mediated practices largely in terms of 


eee ee eee 


406 Manual of Digital Libraries 


information access and utility, libraries fail to recognize or 
engage the sociocultural motivations behind Internet media 
and technology preferences. Research has demonstrated that 
media technology uses and preferences arise contextually, 
based on the specific social relationships they mediate. Use 
of Facebook is no exception. Features like the Wall, chat, 
notes, ads, and the Beacon service blur the lines between 
online and offline communication practices; this trend is 
familiar to students, but may cause difficulty for librarians who 
are new to the network. Ethnographic studies of online 
communities have shown that individuals employ Internet 
technologies to create subjective meanings, identities, and 
community values. Social networking sites like Facebook are 
structured around and employ unique cultural and linguistic 
conventions to which librarians may not be sensitive. All of 
these features, affordances, opportunities, and concerns 
indicate that more research is needed before libraries and 
other academic institutions can become full and appropriate 
participants in social spaces like Facebook. 


But many students appear to be uncomfortable with 
librarians’ profiles because they present an identity that does 
not reconcile with common student perceptions about 
librarians. While some participants found personal profile 
information engaging, others feel that encountering librarians 
as “real people” on Facebook made them uncomfortable. For 
librarians attempting to create successful individual profiles, 
striking an effective personal-professional balance is critical. 
A template for constructing effective individual librarian 
profiles is difficult to formulate. Success will depend in part 
on the individual librarian and the pre-existing offline rapport 
he or she has with students. Note that students will sometimes 
eagerly “friend” a popular professor, even though many of 
these same students would claim that the presence of 

professors Facebook is generally “creepy.” The personality 
of the librarian—both online and offline—will have a critical 


Documents and Resources in Digital Libraries 407 


bearing on the effectiveness of his or her profile. 


So, it is not advisable for librarians to hide from SNSs 
or ban them. Figure out ways to use them in ways that benefit 
organization or the institute to which a library is attached and 
students and ultimately will give a positive teaching moment. 
But we should not forget that: 


When developing a virtual presence, consider 
students’ actual perceptions of the library: While students 
had mixed feelings about using Facebook to communicate 
with individual librarians, many liked the idea of receiving 
information from Library through Facebook. Maintaining an 
organizational page is possible to do with little cost, allowing 
users to interact with the organization as a whole. 
Organizational pages can contain applications, news and 
information, and links to individual employees or services - 
exactly the kind of content students indicated that they would 
like to access. This type of official presence also skirts many 
social and practical complications posed by individual student- 
librarian Facebook “friendships” while providing many of the 
same services initially intended in the “Your Librarian is Your 
Friend” campaign. 

Consider how students view and interact with 
librarians in the real world : One assumption prevalent among 
library forays into Facebook is that the measure of 
effectiveness is whether students can be persuaded to “friend” 
their librarians. It may be more appropriate to think of 
Facebook as a resource for enhancing face-to-face 
relationships or brand awareness, as is the case in many 
commercial applications. An examination of trade literature 
and blogs on commercial marketing through Facebook offers 
other ideas for socially acceptable interactions with customers 
or patrons through the site. Means of interacting with 
customers through Facebook include placing advertisements, 
developing applications, and maintaining an organizational 
page. While “friending” also has a place in commercial 


408 Manual of Digital Libraries 


marketing strategies, it is not the sole strategy, as has been 
the case in many library initiatives. Facebook ads offer an 
effective means of sending targeted messages to students 
or other patron groups. Students may be comfortable in 
receiving messages from a centralized library presence, so 
a library add recruiting participation in a study or announcing 
a library event may be received positively. 


Provide resources, but do not be surprised if students 
do not use them : Although some students may not have 
any interest in Facebook’s library applications, many said they 
would be inclined to use them. Similarly, receptive students 
typically found study tips, for example regarding what 
constitutes a “scholarly source”, and journal or catalogue 
search bars on librarians’ profiles to be appealing and useful. 
Convenience was an important factor; some students liked 
the idea of accessing study resources without having to 
“leave” Facebook. A critical practical issue is whether or not 
the library resources available through Facebook applications 
are more useful and convenient than conventional channels 
of access. If using a library application requires the student 
to manually install it to his or her profile, some will not use the 
application because it seems an unnecessary hassle. But 
students may also avoid installing such an application 
because it could make their profiles appear bookish to peers. 


Recognize that Facebook may play a role in librarians’ 
lives too : Many librarians were using Facebook for personal 
communication and networking prior to the “Your Librarian is 
Your Friend” campaign. In some instances, students found 
this authentic use interesting; in other cases, it made them 
uncomfortable. In addition to student perceptions, however, 
it is important to consider the personal impacts on librarians 
of making a previously personal space an access point for 
students. Negotiating the line between personal and 

professional spaces will be difficult for librarians and other 
educators-just as it is for students. 


Documents and Resources in Digital Libraries 409 


But Facebook has continued to launch new features 
that provide expanded services both within and outside the 
site. Use for commercial and political purposes has grown, 
and the behavioural norms within the site have become more 
formally articulated. As with any social environment, however, 
these norms and uses are constantly changing, so librarians 
participating effectively in this online medium will find social 
attentiveness and acumen to be just as necessary as 
professional and technological savvy. And while librarians 
should accept that their presence on Facebook may be 
unwelcome to some students, they should not be dissuaded 
from exploring the site and capitalizing on the promise that 
social networking tools offer for new and exciting library 
initiatives. 

Moreover, library is ready to provide you the information 
on your mobile phone also. Some database known as mobile 
database—specially designed for handhelds and recently 
announced by the abstracts service, a division of the American 
Chemical Society—became available in late 2005 and may 
portend the arrival of all sorts of databases and library services 
in portable formats. 


Soon, librarians say, students and scholars in law, 
business, and perhaps even the humanities will start using 
handheld devices to gain convenient access to library 
databases. “The content for handhelds is going to get better 
and better,” says Lori Bell, a Librarian at the Mid-lllinois Talking 
Book Center who founded a blog called the Handheld 
Librarian. Future generations of college students already use 
handheld devices and will come to expect information to be 
available where they want it, when they want it, she says. 
Databases for handhelds are now used extensively in medical 
disciplines. Doctors and students at medical schools can refer 
to medical dictionaries, drug-interaction guides, patient 
records, and other databases that have been downloaded to 
handheld computer devices. PubMed, a popular database 


410 Manual of Digital Libraries 


managed by the National Library of Medicine, is now available 
in an abridged, miniaturized form for handhelds. 

So future is opened now. Michael Buckland describes 
the three phases of modern and future libraries as the paper 
library, wherein materials collected and technical operations 
are based largely on paper; the automated library, which sees 
the computerization of most operations while collections 
remain largely paper; and the electronic library, wherein both 
operations and collections for the most part originate, are 
stored, and are used in electronic formats. Clifford Lynch 
distinguishes between an era of modernization, in which 
technology is employed to continue to do what [librarians] 
have been doing, but in a more efficient and/or cost-efficient 
way, and an era of transformation, where librarians use new 
technology to change processes in a fundamental way. 


Buckland and Lynch likely would agree thai just as 
information technology in the classroom and as a scholarly 
communication tool has moved into takeoff, so too have 
academic libraries moved into a critical takeoff phase between 
automation and digitization, between modernization and 
transformation. Just how academic libraries will be defined 
in5 or 10 or 20 years is less important than the incontrovertible 
fact that they will be highly digital and probably largely digital. 


Along with a shift from the largely paper to the largely 
digital library comes a shift away from the model of library as 
locus for information. The proliferation of digital resources, 
services, and tools increasingly aids the delivery of information 
to the desktop, with an increasing proportion of these 
connections occurring directly between information consumer 
and information producer. As libraries digitize collections and 
provide more and more direct access, they also must seek 
ways to provide their full range of services over the network, 
either digitally or through.real-time interactions. As the library 

truly becomes more user-centered and provides information 
and information access to the desktop, it becomes more a 


Documents and Resources in Digital Libraries 411 


concept with emphasis on services than a place with 
emphasis on collections. It should be little surprise, then, that 
the role of the academic librarian is now rapidly shifting, as 
has been anticipated for some time, from information provider 
to information access consultant. 


The term “digital library” unnerves many librarians 
because it seems to preclude so much they know and value, 
but they need not assume that this means an elimination of 
the constituent features or values of contemporary academic 
library services. Most technologists writing on this subject 
understand that the future is uncertain and recognize that 
discussions of digital libraries explore just one component in 
the comprehensive information services that will evolve over 
the next decade or two. Librarians must view these 
discussants as partners, not opponents, and must insinuate 
themselves even into theoretical discussions of the digital 
library, contributing to the dialogue a recognition of the need 
to make such resources available to all through the parallel 
development and delivery of value-added and values-based 
services created and maintained by librarians. In particular, 
librarians must recast the long-lived service values of equity 
of access, personal service, and services tailored to individual 
needs into such newly emerging values as technology 
integration, holistic computing, delivery of core services 
through the network, special efforts to make the technology 
work for all, and collaboration across administrative lines. 


Sentral Library 


185973 


ty, 
‘a 
ET Deemed to 


S 
wy 


Penney 


5 
Internet and 


Internet Resources 


The world is on the threshold of major revolution in 
global information sharing and easy international 
communications. Internet, the de-facto communication 
medium of information age has very much become a 
household name now. The fantastic growth of Net, doubling 
in size in every three months is really exciting. Within a few 
years, Internet connectivity will be available by right, in much 
the same way as having access to a telephone. There are 
wide variety of user services available on the internet -some 
internet wide and some within a specific user community. The 
Internet contains a wealth of information -if only you could 
find it. 


Now, question arises, “what internet is”. Let us try to 
understand it. 


5.1. WHAT IS INTERNET &HOW IT HAS EVOLVED? 


Internet stands for Inter Network systems. This is a 
logical, global, meta network of computer networks having 
no political boundaries. 


Each network supports the technical standards needed 
for interconnection—the TCP/IP family of protocols and a 
common method for identifying computers—but in many ways 


/nternet and Internet Resources 413 


the separate networks are very different. The various section 
of the Internet use almost every kind of communication 
channels that can transmit data. They range from fast and 
reliable to slow and erratic. They are privately owned or 
operated as public utilities. They are paid for in different ways. 
The internet is sometimes called an “information super 
highway.” A better comparison would be to the international 
transportation system, which includes everything from airlines 
to dirt tracks. 


Internet is called the Information Revolution as it affects 
the entire human race on the earth. Albert Gore advocated 
networks of Information Superhighway to link scientists, 
business, people, educators and students for efficiently 
processing and dealing with information. His Information 
Superhighway is nothing but the Internet. Today, we all are 
talking about it, but Internet did not develop overnight. The 
computer is an essential component of the Internet. So, the 
development of Internet started with the invention of abacus, 
then the development of the slide rule in 1622, the mechanical 
calculator in 1647, the automated loom in 1820. The use of 
punched cards were the early development of the computer 
and of the Internet. We can say that these were the first forms 
of primitive programing. In 1888, Charles Babbage laid the 
foundation for the computer on his analytical engine. This 
was the first invention which was known as the first Harmon 
of the Internet. The second Harmon of the Internet was the 
invention of the telephone in 1876, and other important 
inventions which look an important role for the development 
of Internet are the microchip in 1954, the minicomputer in 
1968, the microprocessor in 1970, the microcomputer in 1974, 
and the floppy disk in 1975. 


Other helping developments were the commercial 
Personal Computer and Super Computer in 1980, and 
Microsoft Disk Operating Systems (MS-DOS) in 1981, which 


414 Manual of Digital Libraries 


made the Interneting services much more simpler and 
effective. In 1981, the development of modems with a speed 
of 300 bps added to the development of Internet. In 1996 the 
Optical Carrier came at a speed of 122 mega bits per second, 
which provides the Internet with a fast speed to work. At last, 
the Pentium PC came into existence with a greater speed 
and a larger memory. More and more user-friendly operating 
systems like Windows-95, Windows-98 came later and made 
the interneting services more speedy and User-friendly. 


Later, windows xp, window vista, window server 
software came as better operating system and PC with duel- 
core processor technology in 21st century. And with 
advancement in communication technology, now we have 
faster band width for data-transfer and wifi system to access 
data in wireless environment. 


But above all causes are supplementary to give rise 
the origin of Internet. If we talk about direct origin and evolution 
of Internet, the networks that make up the Internet originated 
in two ways. One line of development was the local-area 
networks that were created to link computers and terminals 
within a department or an organization. Many of the original 
concepts came from tne Xerox Corporation’s Palo Alto 
Research Center (PARC). Universities were pioneers in 
expanding small local networks into campus-wide networks. 
The second source of network developments were the nation 
wide-area networks. 


Additionally following factors also led the emergence 
of Internet. 


e Sharing of resources and information. 
° Drop of price of Computers and other electronic gadjets. 
e Saving time in communication. 


e Making the search easier through network. 


Internet and Internet Resources 415 
e Spiralling increase of prices of journals and books. 
° Decreasing Budget. 
e Avoiding duplication in the holdings of the Library, and 
ə More and more Information on lesser and lesser cost 


As such, the historians can argue over the combination 
of financial, organizational, and technical factors that led to 
the acceptance of the ARPAnet technical standards. Several 
companies made major contributions to the development and 
expansion of the Internet, but the leadership came from two 
U.S. government organizations— the Defense Advanced 
Research Projects Agency (DARPA) and the National 
Science Foundation. 


So Internet, in the real sense, came into existence in 
1969 when the US Government created an agency called 
Advanced Research Projects Agency (ARPA), with J.C.R. 
Lickliders as the head of the computer department. ARPA 
developed a network called ARPANET with the financial help 
of the Department of Defence. The network was mainly 
experimental for military defence purposes. It was used in 
research and to develop and test networking technologies. 
At first, four separate universities of USA were connected 
with four host computers. Up to 1972, 37 host computers 
were connected to it. England and Norway were connected 
with ARPANET in 1973. In the same year, ARPANET split 
into two networks, namely, ARPANET and MILNET in order 
to keep non-military and military network sites separately. In 
1972, the National Center for Supercomputing Application 
(NCSA) developed Telnet application for remote log-in and 
in 1973, the File Transfer Protocol (FTP) for standardizing 
the transfer of files between networked computers was 
introduced and made Internet work easier. 


In 1986, the National Science Foundation (NSF) 
connected the nation’s six supercomputing centres together 


416 Manual of Digital Libraries 


and to called this the NSFNET or NSFNET Backbond. In 1987, 
NSF awarded a grant to Merit Network Inc. to operate and 
manage the future development of NSFNET Backbond. Merit 
Network Inc. collaborated with the International Business 
Machines (IBM) Corporation and MCI Telecommunications 
Corporation to research and develop faster networking 
technologies. NSFNET Backbond was upgraded in 1989 to 
make able to transmit data at speeds of 1.5 million bits of 
data per second or about 50 pages of text per second. In 
1993, T1 upgraded to. T3 to make able to transmit data at 
speeds of 45 million bits of data per second or about 1,400 
pages of text per second. In 1995, the NSFNET backbond 
was replaced by new network architecture, called VBNS (very 
high speed backbond network system) that utilizes Network 
Service Providers, Regional Networks and Network Access 
Points (NAPs). 


The years during 1990s saw a sweeping change in the 
way we live, work and interact Information and communication 
have become the basis for much of the World’s post industrial 
society. The changing global, political and economic climates 
were creating more opportunities and more challenges in 
every walks of life. As a part of daily life people were striving 
hard to have more and more information in their field of work. 
Computers, fax machines cellular telephones, pagers, and 
other “information appliances” became common not only in 
the office but also in the home as well. The distinction between 
home and office was becoming blurred, and plans were ahead 
for a sea change in the information technology scenario. 


In 1991, the University of Minnesota developed the 
Gopher, which provides a hierarchical, menu-based method 
for providing and locating information on the Internet and 
makes the use of Internet much easier. The year 1993 was a 
landmark in the development of Internet, when the European 
Laboratory for Particle Physics (CERN) in Switzerland 
releases the World Wide Web (WWW) WWW was developed 


/nternet and Internet Resources 417 


by Tin Berners-Lee. Another development in Internet came 
into existence in 1993-94 when graphical web browser Mosaic 
and Netscape Navigator were introduced and spread through 
the Internet community. 


In 1998, the total networks available on earth crossed 
65,000 numbers and more than 16 million computers 
connected to it; more than 100 million users are getting 
information through accessing it, spread over 140 countries. 
Later the increased band width and the development of wifi 
technology in first decade of 21st century made, Internet an 
essentality. 


But the growth of Internet in India has been slow as 
compared to Western Europe and South East Asia. However, 
it has been assumed that Internet access in India will grow at 
the rate of 164 percent in the next few years as compared 
global rate of 59 percent. The traffic on the Internet is doubling 
every hundred days. The Internet has undergone a virtual 
explosion, due to the enormous popularity of World Wide 
Web. At present more than 4000 Indian users have access 
to Internet through VSNL, and 7000 users from education 
and research community through ERNET, It accounts to a 
very small fraction of total population of India. Dept. of 
Telecommunication (DoT) and VSNL are the two key public 
sector bodies controlling Internet till date. VSNL has presence 
only in 16 cities all over India. The situation however changed 
after 2000 and Internet has became widely accepted thing. 


5.1.1. Advantages of Internet 


Internet helps to solve all sorts of communication 
problems for individuals, small business and large 
corporations, government departments researchers, scholars 
and all sorts of information seekers alike. That is why the 
participants to Internet has been doubled every year since 
1988, A new member of Internet may be overwhelrned by 


418 Manual of Digital Libraries 


the features and capabilities of the net. The potentialities of 
Internet is known if we understand the main features of the 
system. These are : 


e The Internet is a medium for effectively communicating 
with others. 


e Itis a research support and information retrieval 
mechanism. 


e The Internet is flexible in cost and features. 


e Itis at once a local and International entity allowing 
interaction among users separated by an office wall or 
by an ocean. 


e The Internet is not a specific piece of software or 
hardware, rather it is a network of information networks. 


e Itisnota single network, but a group of network logically 
arranged in a hierarchy. 


e Internet is not owned by any government, corporation 
or University. 


e Itis not the same everywhere, but vastly different. 


e Itis not restricted to research only. People of all walks 
of life are making use of it for their day-to-day activities. 


e Apart from Computer professionals and engineers, it is 
being used daily by people with some background 
interest in the field. 


e It is a collection of thousands of computer networks, 
tens of thousands of computers and more than ten 
millions of users who share a compatible means for 
interacting with one another to exchange information 
located in separate geographical area. 


5.1.2. Disadvantages of Internet 


Though Internet offers many advantages and 


Internet and Internet Resources 419 
application, yet ithas some problems also some of them are:- 
e Seems to be virtual mess. 


e Setting up and effective running of intranet requires 
skilled personnel. 


° Information sources may be of doubtful quality, as there 
is control over them. 


° Maintenance is one of the big problems posed by 
intranets. The setting up an internet is easy but proper 
maintenance and up dating is a difficult task. 


o It is difficult to navigate through it. 


° Both the staff members and users should be trained 
properly for effective use of internet. 


° It is difficult to control over non-productive use. 


e Security is another important problem posed by the 
internet. Care should be taken that users are misusing 
the internet, to disturb access of Internet. 


e Noise in telecommunication leading to traffic jam over 
net. 


5.1.3. How Does this Internet Works ? 


To understand how information is moved through the 
Internet, we can draw analogies with postal mail. When you 
post a letter, your letter your letter is taken to the local post 
office and sorted on the basis of address. The postal service 
may choose a route to deliver the letter passing through 
different intermediate stations. Similarly when you send a 
message or command from your computer to another internet 
computer a circuit carry your messages that might be carrying 
messages and signals bound for multiple locations, yet give 
their appearance at the end point of a dedicated line. 


420 Manual of Digital Libraries 


The connectivity to Internet is established and protocols 
are maintained - TCP/IP requires that data be broken into 
small chunks called packets with the address of the 
destination computer at the beginning. Every computer on 
the Internet has a unique address, which can be represented 
in one or two ways— an IP address is like a telephone number 
(a 32 bit no: 250.50.15.6). An Internet address has two 
components - the network identity and host identity. Network 
numbers makes address allocation. 


TCP is the upper layer protocol that relies upon IP. TCP 
guarantees that the packets sent through the Internet are 
proper packaged and transmitted in a reliable fashion. Once 
armed with IP address, the computer can send the information 
in the right general direction until it reaches another computer 
which reads the destination address and send it along. TCP/ 
IP and the numeric addresses known as IP addresses are 
introduced in panel 5.1. Another way to identify a computer 
on the Internet is to give it a name, such as 
tulip.mercury.cmu.edu. Names of this form are known as 
domain names, and the system that relates them to IP 
addresses is known as the domain-name system (DNS). The 
Domain Name System maps Internet addresses. To function 
as part of the Internet a host needs a domain name that has 
an associated Internet Protocol (IP) address record. This 
includes any computer system connected to the Internet via 
full or part-time, direct or dial-up connections. DNS servers 
perform the necessary function of translating back and forth 
between names and numbers. These servers contain 
databases of IP addresses and corresponding domain names 
and they are interrogated each time a user wants to send an 
e-mail or request data over the World Wide Web. A top-level 
domain name (TLD) can either be an ISO country code or 
one of the generic top level domains. 


Internet and Internet Resources 421 
Panel 5.1. TCP/IP 


The two basic protocols that form the Internet are TCP 
and IP. One sees them mentioned together (TCP/IP) 
so often that it is easy to forget that these abbreviations 
represent two separate protocols. 


The /nternet Protocol (\P) interconnects the separate 
network segments that constitute the Internet. Every 
computer on the Internet has a unique address, known 
as an IP address. The address (for example, 
250.150.15.6) consists of four numbers, each in the 
range 0-255. Within a computer these are stored as 
four bytes. When printed, the convention is to separate 
them with periods as in this example. The Internet 
Protocol enables any computer on the Internet to 
dispatch a message to any other The various parts of 
the Internet are connected by specialized computers 
known as routers. As their name implies, routers use 
the IP address to route each message on the next 
stage of the journey to its destination. 


On the Internet, messages are transmitted as short 
packets, typically a few hundred bytes in length. A 
router simply receives a packet from one segment of 
the network and dispatches it on its way. An IP router 
has no way of knowing whether the packet ever 
reaches its ultimate destination. Users of the network 
are rarely interested in individual packets or network 
segments. They need reliable delivery of complete 
messages from one computer to another. This is the 
function of the Transport Control Protocol (TCP). An 
application program at the sending computer passes 
a message to the local TCP software. TCP takes the 
message, divides it into packets, labels each with the 
destination IP address and a sequence number, and 
sends them out on the network. At the receiving 
computer, each packet is acknowledged when 
received. The packets are reassembled into a single 
message and handed over to an application program. 


422 Manual of Digital Libraries 


TCP should be invisible to the user of a digital library, 
but the responsiveness of the network is greatly 
influenced by the protocol and this often affects the 
performance that users see. Not all packets arrive 
successfully. A router that is overloaded may simply 
ignore (“drop”) some packets. If this happens, the 
sending computer never receives an acknowledgment. 
Eventually it gets tired of waiting and sends the packet 
again. This is known as a “time-out.” Itis perceived by 
the user as an annoying delay. 


TCP guarantees error free delivery of messages, but 
it does not guarantee that they will be delivered 
punctually. For some applications, punctuality is more 
important than complete accuracy. Suppase one 
computer is transmitting a stream of audio that another 
is playing immediately on arrival. If an occasional 
packet fails to arrive on time, the human ear would 
much prefer to lose tiny sections of the soundtrack 
rather than wait for a missing packet to be 
retransmitted, which would be horribly jerky. Since 
TCP is unsuitable for such applications, they use an 
alternate protocol, named UDP, which also runs over 
IP. With UDP, the sending computer sends out a 
sequence of packets, hoping that they will arrive. The 
protocol does its best, but makes no guarantee that 
any packets ever arrive. 


To register a second level domain name or a third level 
domain name a user needs to apply to the domain name 
registry with the delegated authority for the TLD or gTLD, 
Some registries publish data on the number of registrations 
on a monthly basis while others publish intermittently. The 
Internet Hosts surveys undertaken by Network Wizards and 
RIPE also provide, as a by-product, an indicator of the number 
of registrations under each domain. The main importance of 
DNS indicators are that they can be used to inform discussions 
over the different policies and prices of TLD and gTLD 
registries. Whether the registration process under a certain 


/nternet and Internet Resources 423 


domain name is subject to industry self-regulation or 
government oversight, the availability of DNS data is important 
to ensure transparency in registration management for service 
providers, business users and consumers. 


This is particularly important in those cases where a 
monopoly or monopoly power exists in the registration of 
second and third level domain names. These tools can provide 
useful information for constructing certain Internet indicators. 
While this information is not always reliable, and some users 
would like to see additional information or functionality 
included, the various ‘Who is?’ The United States White Paper 
“A Proposal to Improve the Technical Management of Internet 
Domain Names and Addresses Discussion Draft”, contained 
a number of suggestions for the type of information that should 
be included in domain registration databases and it will be up 
to the new DNS authority to work through guidelines in this 
area. Internet Protocol (IP) addresses are the numbers used 
to identify computers, or other devices, on a TCP/IP network. 


Networks using the TCP/IP protocol route messages 
based on the IP address of the destination. The format of an 
IP address is a 32-bit numeric address written as four numbers 
separated by periods. Computers that supports the TCP/IP 
protocols usually provide a standard set of basic applications. 
These applications are known as the TCP/IP suite. Some of 
the most commonly used are listed in panel 5.2. 


Panel 5.2 The TCP/IP suite 


The 7CP/P suite is a group of computer programs 
(based on TCP’ IP) that are provided by most modern 
computers. They include the following. 


The emulation: A program known as Telnet allows a 
personal computer to emulate an old-fashioned 
computer terminal that has no processing power of its 
own and relies on a remote computer for processing. 
Since it provides a lowest-common-denominator user 


424 Manual of Digital Libraries 


interface, Telnet is often used for system 
administration. 


File transfer: The protocol for moving files from one 
computer to another across the Internet is the file 
transfer protocol (FTP). Since FTP was designed to 
make use of TCP, it is effective for moving large files 
across the Internet. 


Electronic mail: \nternet mail uses the Simple Mail 
Transport Protocol (SMTP). This is the protocol that 
turned electronic mail from a collection of local 
services to a single world-wide service. It provides a 
basic mechanism for delivering mail. In recent years, 
a series of extensions have been made to allow 
message to include wider characters sets, permit multi- 
media mail, and support the attachment of files to mail 
message. 


5.1.4. The Internet Community 


The Internet pioneered the concept of open standards. 
In 1997, Vinton Cerf and Robert Kahn received the National 
Medal of Technology for their contributions to the Internet. 
The citation praised their work on the TCP/IP protocols, but it 
also noted that they had “pioneered not just a technology, 
but also an economical and efficient way to transfer that 
technology,” and that they had “steadfastly maintained that 
their internetworking protocols would be freely available to 
anyone.” TCP/IP, the citation continued, “was deliberately 
designed to be vendor-independent to support networking 
across all lines of computers and all forms of transmission.” 


Panel 5.3. NetNews 


The NetNews bulletin boards (also known as Usenet) 
are an important and revealing examples of the Internet 
community's approach to the open distribution of 
information. Thousands of bulletin boards, called 
newsgroups, are organized in a series of hierarchies. 


Internet and Internet Resources 425 


The highest-level groupings include comp, rec, the 
notorious alt, and many more. For example, 
rac.arts.theatre.musicals is a bulletin board for 
discussing musicals. 


The NetNews system is so decentralized that no one 
has a comprehensive list of all the newsgroups. An 
individual who wishes to post a message to a group 
sends it to the local news host. This passes it to its 
neighbours, who pass it to their neighbours, and so 
on. 


NetNews is the exact opposite of a digital library in 
the sense that NetNews information is entirely 
unmanaged. There are essentially no restrictions on 
who can post or what one can post. At its worst the 
system distributes libel, hate, and simply wrong 
information, but many newsgroups work remarkably 
well. For example, people around the world who use 
the Python programming language have a newsgroup 
(comp.lang.python) in which they exchange technical 
information, pose queries, and communicate with the 
language's developer. 


The Internet tradition continues to emphasize 
collaboration on technical matters, and the continuing 
development of the Internet remains firmly in the hands of 
engineers. Some people seem unable to accept that the U.S. 
governmentis capable of anything worthwhile, but the creation 
of the Internet was led by government agencies, often against 
strong resistance by companies who now profit from its 
success. Recently, attempts have been made to rewrite the 
history of the Internet to advance vested interests, and 
individuals have claimed responsibility for achievements that 
many shared. There is a striking contrast between the 
coherence of the Internet—coordinated by far-sighted 
government officials—and the mess of incompatible standards 
in areas left to commercial competition, such as mobile 
telephones. 


426 Manual of Digital Libraries 


An important characteristic of the Internet is that the 
engineers and computer scientists who develop and operate 
7it are heavy users of their own technology. They 
communicate by email, dismissing conventional mail as “snail 
mail.” When they write a paper, they compose it at their own 
computer. If it is a web page, they insert the markup tags 
themselves rather than use a formatting program. Senior 
computer scientists may spend more time preparing public 
presentations than writing computer programs, but 
programming is the basic skill that everybody is expected to 
have. 


5.1.5. Scientific Publishing on the Internet 


The publishing of serious academic materials on the 
Internet goes back many years. Panels 5.4 and 5.5 describe 
two important examples— the Internet RFC series and the 
Physics E-Print Archives at the Los Alamos National 
Laboratory. Both are poorly named. “RFC” once stood for 
“Request for Comment,” but the RFC series is now the 
definitive technical series for the Internet. It includes a variety 
of technical information and the formal Internet standards. 
The Los Alamos service is not an archive in the Usual sense. 
Its primary function is as a “preprint server’—a site where 
researchers can publish research as soon as it is complete, 
without the delays of conventional journal publishing. 


Panel 5.4. The Internet Engineering 
Task Force and the RFC Series 


The Internet Engineering Task Force is the body that 
coordinates technical aspects of the Internet. Its 
methods of working are unique, yet it has proved 
extraordinarily good at getting large numbers of 
people, many from competing companies, to work 
together. The first unusual feature is that the IETF is 
open to all. Anyone can go to meetings, join working 
groups, and vote. 


Internet and Internet Resources A27 


The IETF’s basic principle cooperation is “rough 
consensus and working code.” Anyone who wishes to 
propose a new protocol or some other technical 
advance is encouraged to provide a technical paper 
(called an /nternet Draft) and a reference 
implementation of the concept. The reference 
implementation should be in the form of openly 
available software. At meetings of working groups, the 
Internet Drafts are discussed. If there is a consensus 
in favour of going ahead, a draft may be put on the 
RFC standards track. No draft standard can become 
a formal standard until implementations of the 
Specification (usually, computer programs) are 
available for everybody to use. 


The IETF began in the United States but is now 
international. Every year, one meeting is, held outside 
the United States. Participants, including the leaders 
of working group, come from around the world. The 
IETF, originally funded by U.S. government grants, is 
now self-sufficient. The costs are covered by meeting 
fees. 


The processes of the IETF are open to all who wish to 
contribute. Unlike some other standards bodies, whose 
working drafts are hard to obtain and whose final 
standards are complex and expensive, all Internet 
Drafts and RFCs are available online, Because of the 
emphasis on working software, the first of two rival 
technical approaches to be demonstrated with 
software that actually works has a high chance of 
acceptance. As a result, the Internet’s core standards 
are remarkably simple. 


Recent IETF meetings have attracted more than 2000 
people; however, because they divide into working 
groups addressing specific topics, a feeling of intimacy 
remains. Almost everyone is a practicing engineer or 
computer scientist. The managers stay home. The 
formal meetings are short and informal, the informal 
meetings long and intense. Many an important 


428 Manual of Digital Libraries 


specification has come out of a late-night session at 
the IETF, with people from competing organizations 
working together. 


Because of its rapid growth, the Internet is always in 
some danger of breaking down technically. The IETF 
is the fundamental reason that it shows so much 
resilience. lf a single company controlled the Internet, 
the technology would be as good as the company’s 
senior engineering staff. Because the IETF looks after 
the Internet technology, the world’s best engineers 
work together to deal with anticipated problems. 


The Internet Drafts are a remarkable series of technical 
publications. In science and engineering, most 
information goes out of date rapidly, but journals sit 
on library shelves for ever. Internet Drafts are the 
opposite. Each begins with a fixed statement that 
includes this wording: “Internet-Drafts are draft 
documents valid for a maximum of six months and 
may be updated, replaced, or obsoleted by other 
documents at any time.” It is inappropriate to use 
Internet-Drafts as reference material or to cite them 
other than as work in progress. 


The IETF posts online every Internet Draft that is 
submitted, and it notifies interested people through 
mailing lists. Then, the review begins. Individuals post 
their comments on the relevant mailing list. Comments 
range from detailed suggestions to biting criticisms. 
By the time the working group comes together to 
discuss it, a proposal has been subjected to public 
review by the experts in the field. 


The RFCs are the official publications of the IETF. 
These few thousand publications form a series that 
goes back to 1969. They are the heart of the 
documentation of the Internet. The best known RFCs 
are those that form the standards track. They include 
the formal specification of each version of the IP 
protocol, Internet mail, components of the World Wide 


Internet and Internet Resources 429 


Web, and many more. Other types of RFC include 
informational RFCs, which published technical 
information relevant to the Internet. 


Discussions of scientific publishing rarely mention the 
RFC series, yet itis hard to find another set of scientific 
or engineering publications that are so heavily 
reviewed before publication or so widely read by the 
experts in the field. RFCs have never been published 
on paper. Originally available over the Internet by FTT, 
they are now available on the web. 


Whatever the merits of their names, these two services 
are of fundamental importance for the publishing of research 
in their respective fields. They are important also because 
they demonstrate that the digital libraries find new ways of 
doing things. One of the articles of faith within scholarly 
publishing is that quality can be achieved only by peer review, 
the process by which every article is read by other specialists 
before publication. The process by which Internet Drafts 
become RFCs is an intense form of peer review, but it takes 
place after a draft of the paper has been officially posted. 
The Los Alamos service has no review process. Yet both 
have proved to be highly effective methods of scientific 
communication. 


Panel 5.5. The Los Almost E-Print Archives 


The Physics E-print Archives provide an illuminating 
example of practicing scientists taking advantage of 
the Internet technology to create a new form of 
scientific communication by extending the custom of 
circulating preprints of research papers. The first 
archive was established in 1991 by Paul Ginsparg of 
the Loss Alamos National Laboratory to serve the 
needs of a group of high-energy physicists. Later, 
archives were created for other branches of physics, 
mathematics, and related disciplines. In a 1996 
UNESCO talk in Paris, Ginsparg reported: “These 


430 Manual of Digital Libraries 


archives now serve over 35,000 users worldwide from 
over 70 countries, and process more than 70,000 
electronic transactions per day. In some fields of 
physics, they have already supplanted traditional 
research journals as conveyers of both topical and 
archival research information.” 


The primary function of the E-print Archives is to 
present the results of research, often in a preliminary 
version of a paper that will later be published in a 
traditional journal. Papers are prepared for the archives 
in the usual manner. Many Physicists use the TeX 
format, but PostScript and HTML are also used. 
Graphs and data are sometimes embedded in text. 


A paper may be submitted to an archive by electronic 
mail, by file transfer (using the FTP protocol), or via 
the web. The author is expected to provide a short 
abstract and a standard set of indexing metadata. The 
processing is entirely automatic. The archives provide 
an electronic-mail-based search service, a web-based 
search system, and an email notification service to 
subscribers. Search options include searching one 
archive, searching many archives, searching by author 
and title, and searching the full text of abstracts. 


The technology of the archives is straightforward. They 
use the standard formats, protocols, and networking 
tools that researchers know and understand. The user 
interfaces have been designed to minimize the effort 
required to maintain the archive. Authors and readers 
are expected to assist by installing appropriate 
software on their own computers and by following 
procedures. 


This is an open-access system, funded through annual 
grants from the National Science Foundation and the 
Department of Energy. Authors retain copyright in their 
papers. 


Ginsparg writes, “many of the lessons learned from 
these systems should carry over to other fields of 


Internet and Internet Rescurces 431 


scholarly publication, i.e., those wherein authors are 
writing not for direct financial remuneration in the form 
of royalties, but rather primarily to communicate 
information for the advancement of knowledge, with 
attendant benefits to their careers and professional 
reputations.” 


They are also interesting economically. Both services 
are completely open to the user. They are run professionally, 
with substantial budgets, but no charges are made to authors 
who provide information to the service or to readers who use 
the information. 


Both of the aforementioned services were well 
established before the emergence of the web. The web has 
been so successful that many people forget that there are 
other effective ways to distribute information on the Internet. 
Both the Los Alamos archives and the RFC series now use 
web methods, but they were originally based on electronic 
mail and file transfer. 


5.2. GETTING STARTED WITH INTERNET 

The basic requirements to get into the Internet are : 
Computer, Modem, Communication link — Dial up or lease 
line access, and Password or login address. 

First we need Internet connections to start Internet 
access. There are available three types of Internet 
connections, which are : Full Connection, Dial-up Connection 
and Gate Way Connection. 

Full Connection : This is possible only if there is a 
permanent telecommunication link and the computer has a 
registered Internet name and address. 

Dial-up Connection : It is through a temporary tele- 
communication link to the computer that has full access. 

Gateway Connection : It is where the connection is 
made through another network or service supplier. 


432 Manual of Digital Libraries 


In order to browse and find the information, the users 
always require the service of the search engines. In addition 
to this, there are numerous directories of Internet resources. 
One of the most valuable of these is BUBL (Bulletin Board 
for Libraries). BUBL is an information service designed to 
support Library and Information Science professionals. 
Amongst a wide range of services, it includes directories of 
resources, users and hints current content of LIS Journals 
and Electronic journals and texts. 


The key to connecting to the Internet is modem. As 
mentioned it is the translator between computers and 
telephone system that connect the digitized information that 
a computer generate in to sound waves. Modem depends on 
the software to instruct them what to do. Some software may 
specially designed for some particular purposes. Some of 
these are generic and others work with particular Internet 
providers. Most academic institutions have their own Internet 
connection mode and provide accounts to students. Some 
Universities have a dial-up connection that requires only a 
basic communication programme. 


5.2.1. Internet Service Providers and Popular Networks in 
India 


Internet service provider (ISP) is the bridge between 
Internet and the customers with a point of presence on their 
network. The customers should have connection to one of 
ISP’s server either through dial-up or lease-line. Until early 
1990's Internet access was possible through some kinds of 
dedicated line connection. Later with the introduction of 
various Internets access providers, dial-up* accounts became 
available. By early 1994, more then hundred dial-up 
commercial Internet access providers were offering services 
in North America alone. Now there are various ways to get 
Internet connectivity, ranging from dial-up facility to satellite 
links. 


Internet and Internet Resources 433 


Today’s information networks have broken down all the 
barriers of time and space enhancing users expectations 
continuously and creating the dements for more and more 
sophisticated, high quality Information products and service. 
Today in India there are various networks offering online 
services through Internet. Some of them are as follows: 


ERNET : In November 1986 ERNET was started by 
Dept. of Electronics (DOE) with the financial support at Govt. 
of India and a United Nation Development Programme 
(UNDP). 

The Educational and Research Network (ERNET) 
implemented by Department of electronics (DOE) has over 
400 organizations connected with in India and neighbouring 
countries. These institutions are mainly academic and 
research organizations non-governmental organizations and 
to a limited extent private and international organizations. with 
several backbone support ERNET, is covering DoE (New 
Delhi), IIT (Chennai), and IISC (Bangalore) IMTECH 
(Chandigarh), VECC (Kolkata), IUCAA (Pune), NCST 
(Mumbai) and University of Hyderabad, which make enable 
organizations located at different geographical locations to 
access various services of Internet. ERNET is actively 
engaged in providing Internet connectivity to member 
universities in UGC— INFONET programme. 


NICNET : The National Information Centre Network 
(NICNET) is operational since 1987. NICNET covers all district 
head quarters, states/UT capitals and national capital. Internet 
connectivity through NICNET has been made available via 
VSNL since 1995. It provides services to Govt. organizations 
in the country by using satellite base data communication of 
National Informatics Centre (NIC). NICNET has been offering 
Internet Connectivity through dial-up mode under the service 
name RENNIC (Research and Education Networks) to 
academic and research organizations. It is estimated that 
more than one million users in 8000 institutions are using 


434 Manual of Digital Libraries 


NICNET facility in India. NICNET has become an important 
network facilities decision-making in India. 


VSNL (Videsh Sanchar Nigam Ltd.) : Videsh Sanchar 
Nigam Limited has (VSNL) has started the internet service in 
August 1995, till that time it was the only ISP. It acts as the 
gate keeper to Internet Connectivity. 


Softnet : The Dept. of Electronics, Software Technology 
Park of India (STP) launched SOFTNET in 1993, which has 
started offering Internet service in collaboration with Section 
Service, India for software development companies. STP has 
six centers located at Bhubhaneswar, Trivandrum, 
Hyderabad, Gandhinagar, Noida and Bengaluru. 


Satyam : Satyam Online was launched in December 
1998 as a first private Internet Service Provider to announce 
the Internet service in 12 major cities in India. It proposes to 
cover more then 40 cities. 


Mahanagar Telephone Nigam Ltd. (MTNL) : MTNL 
was launched in March 1999 in Mumbai and Delhi to provide 
Internet service to general public. 


After getting services from Internet Service Provider 
(ISP), computers are connected either physically through 
cable connection, particularly by optic fibre cables or 
wirelessly using wifi system. Once Internet connection is 
established, the basic function of the communication software 
is to be explored. Though expertize is not needed for this, it 
is important to know the uploading and down loading of 
information. Uploading is the process of sending a file from 
one computer to another in the net and down loading is the 
process of retrieving a file from a distant computer and copying 
it in to ones own computer. 


5.3. FACILITIES ON INTERNET 


There are various Internet facilities such as E-MAIL, 
WAIS, TELNET, GOPHER, VERONICA & JUGHEAD, FILE 


/nternet and Internet Resources 435 


TRANSFER PROTOCOL (FTP) etc. with the help of which. 
information can be retrieved. It is always better to get in to 
the habit of all the tools available. 


5.3.1. E-Mail Facility 


Most of the students and the researchers are very much 
familiar with the e-mail and are using this exhaustively either 
to keep in touch with the friends and other colleagues or for 
mailing and receiving information. It is the most widely used 
tool on the Internet. 


Now millions of people around the world have e-mail 
accounts. Like regular mail, users send messages to other 
people using the unique addresses. Instead of being trucked 
from post office to post office, the message moves through 
several computer systems each one closer to the recipient 
address or home computer. E-mail can circulate not only 
between individuals but, among the members of a group. It 
is also possible to subscribe to electronic magazines and 
news papers through e-mail. 


E-mail message will reach to its destination in any part 
of the world within minutes, rather than roaming around taking 
several days or week or months. As soon as the mail reaches 
reply can also be sent and the sender gets the reply even 
before he leaves his computer. Though, we pay for this 
access, it is far cheaper, quicker and more convenient than 
any of the mailing system. 


5.3.2. Telnet Facility 


Itis one of the basic Internet search programmes, which 
is used for logging on to other computers. By typing the 
address of a distant computer, we can “Telnet” to that location, 
search a library catalogue, read a journal article or use a 
periodical index as if it were on our own computer or nearby. 


Telnet is an older text based programme. But since the 


436 Manual of Digital Libraries 


researchers are in search of materials for writing their papers 
they are mostly seeking databases containing large text files. 
It is also possible to have or to ‘Telnet’ the e-mail stored in 
the home computer from a distant place. A guest account of 
a local computer can be opened and it is possible to Telnet 
back to the home computers to get any personal information 
stored. But since it is by way of logging into some one else’s 
computer, a password is needed to use this service. 


5.3.3. Gopher Facility 


This facility was a boon to the researchers, or users of 
the Internet. Until this was developed, a person could only 
‘telnet’ to a location where address was already known. 
Gopher is a menu that allows us to browse sources all over 
the world without having to look up the internet address, or it 
will ‘go for’ the place in search of the document asked for. 
Only some commands need be navigated with in any gopher, 
irrespective of the place where the particular information is 
located. This menu also warns the user when he is about to 
leave gopher space and to inform to get back in case of facing 
any crises. 


It also helps to search for information on a particular 
subject area, no mater where it is located, or in what form it is 
located. We can even say that without this facility, Internet is 
just like a library without a catalogue, when all the vast and 
rich knowledge is located in the library in many format like 
books, journals, patents etc. But without the device to find 
out any of them, Gopher serves the same function of a 
catalogue in a library. Just to have a minute piece of 
information is like searching for a needle from a huge bundle 
of hay. But this problem will be solved by this facility like, an 
OPAC. To got into Gopher programme, it is necessary to 
telnet’ the first destination to the computer. Then it will 
automatically choose the places that the person wishes to 
visit. 


Internet and Internet Resources 437 
5.3.4. WAIS 


Wide-Area Information Service (WAIS) is the best 
network programme for doing scholarly research. It itself acts 
as a Subject index that allows to make a wider search through 
the stored database within a discipline. Just like Gopher, 
WAIS locates information regardless of where it is actually 
located. 


5.3.5. Veronica and Jughead 


Veronica and jughead are search tools that help to find 
the information in Gopher space. Jughead collects searchable 
menu information from individual gopher serves. Veronica is 
more powerful and can search actual files and not the news. 
Jughead and Veronica are menu choices on most Gophers. 


5.3.7. File Transfer Protocol (FTP) 


It is a file transfer programme designed to transfer files 
from one computer or server to another. It is also called 
“anonymous” FTP because we need not have an account on 
the computer where the files are stored, No password is 
needed. 


Files are moved to a computer, which has full Internet 
connection. If the computer has a dial-up connection i.e., the 
computer is connected to a computer which has Internet 
connection, FTP has to go through two step process. First, 
file will be moved to the first home computer, then only to the 
dial up connection. For a large file transfer process, it is better 
to use computers having direct full Internet connection. 


5.3.8: World Wide Web (WWW) 


The World Wide Web is a distributed information system 
on the Internet. This distributed information system could 
contain data, text, images and other types of data and 
information. Documents are stored in hypertext form with links 


438 Manual of Digital Libraries 


to other relevant documents that could be kept on various 
machines around the networks. User can retrieve these 
pieces of data and information from wherever they reside on 
the web by pointing and checking with a mouse and interact 
with services such as databases by using forms and menus. 
There are several programme available that allow a user to 
view, write and publish document on the web. Information 
can be located on the WWW by using subject indexes, web 
crawlers and on-line databases. 


New capabilities and information resources are 
continuously created on the web making it a very dynamic 
system. However there are limitations due to the rigid structure 
of the hypertext model and the chaotic distribution and 
management of Web resources. These difficulties pose some 
challenges for future development of the WWW. 


The main advantage of the WWW is that it provides an 
easy and uniform way to access information from around the 
world by unifying the various methods of obtaining information 
from the Internet into a single arid simple interface. A variety 
of information are available on the web. People have created 
documents that contain information on current events 
academic research, commercial advertising information on 
recreational activities and hobbies, up to the minute weather 
information and images and computer software. New 
documents, search retrieval and computational capabilities 
are continuously being placed on the web. To interact with 
and to access this information a user can employ a variety of 
browsers to down load and view web documents. 


5.4. SURFING THE INTERNET 


Internet is surfed with the help of web browser. A web 
browser is a software program that allows to view the images 
and text available on Internet. Netscape/Internet Explorer and 
Google Chrome are the web browser, which allows searching 


Internet and Internet Resources 439 
for information on Internet. 


One can surf the Internet by going through directly web 
address. The terms “Web Address: and URL” are pretty much 
used interchangeably. They stand for the series of letters and/ 
or numbers type into move to a homepage on Internet. For 
instance, the web address for RRL is Attp:/rr/bhu.res.in and 
typing this, it will take a RRL’s homepage. It will open the 
homepage of RRL, Bhubneshwar and user can find relevant 
informatin there. The other way is, searching or surfing 
through search engines. A search engine is a program that 
will look through the sources information on the Internet and 
give a list of websites. 


There are many “search engines” (Lycos, Infoseek, 
Google, Yahoo, Excite, AltaVista). Most of these search 
engines have subject category databases. There are two 
strategies for the search terms: 


— Putin larger, more general search terms (for example, 
aquaculture) and then, look at many, many of the 
hundreds of sites will find and pick out the ones most 
interesting, or 


— Putting in more specific search terms of interest (i.e., 
journals in Aquaculture) and probably would not have 
many site to go through. 


5.4.1. Art of Surfing 


The Web is more than just a reference tool. It is also a 
place to hang out, look around, and explore. And yet 
sometimes it feels hard to navigate. Whether the homepage 
is Netscape, Microsoft, Yahoo, AltaVista, AOL, your ISP, it 
pays to know where else one can go to get started Many 
sites specialize in filtering the web-weeding out the best, the 
newest, and the weirdest sites available. Some sites feature 
a different link every day or week. 


440 Manual of Digital Libraries 


There are several places to find new and notable sites. 
Yahoo! What’s New presents five daily picks every weekday, 
as well as links to live net events, new additions to Yahoo, 
and selected Internet starting points such as the Dilbert 
cartoon and Astrologer horoscopes. Along the same lines, 
Netscape’s What’s New features new and notable sites, each 
annotated with a short description. Finally, there is What’s 
New Tool, not a highly filtered list like the other two, but more 
of a listing service. The quick turn around time makes it 
interesting because many of the sites listed are brand new to 
the Net. There are plenty of other sites that specialize in “new” 
resources. Browse them in Yahoo!’s Indices to Web 


documents— What is new category and bookmark your 
favourites. 


Of course, surfing the web is not just about finding the 
latest or greates. There is plenty of valuable information online 
that is been there for some time. But how does one find it 
when he/she not looking for something specific? The 
searching databases are not only source for topic-based 
surfing but also an Argus Clearinghouse. 


Very often, people include their own collections of 
related links on the sites they build. For example, a fan’s 
baseball site may include a handful of links to other terrific 
baseball sites. A site about making muffins may include links 
to other sites about baking, or desserts in general. It is this 
type of linking that makes the Web a “web” in the true sense 
of the word. And as a surfer, one can take advantage of other 
people’s explorations. The next time he/she visit one of his / 
her own preferred sites, look for the links. 


Sometimes people build pages that are simply 
collections of links. They are often huge lists of sites that cover 
a particular subject, or set of subjects. Whenever surfing one 
can look for the category. That is where the links live. 


Internet and Internet Resources 441 
5.4.2. Search Engines and Search Techniques 


There are many search engines. Some are general and 
some are such that where only a specific subject search can 
be made. 


If one knows exactly what he /she looking for or even 
have a general idea, “should try Yahoo search engine.” Yahoo 
will search by specifying a keyword or set of keywords, and 
search its entire database to find listing that match the 
keywords provided. 


After specifying keyword(s) inside the query box, click 
on the search button. Yahoo will search through the five areas 
of its database for keyword matches. The five areas are: 
Yahoo! Categories, Yahoo! Web Sites, Web Pages, News 
Stories, and Yahoo! Net Events. 


The first page returned to screen will be a list of 
matching Yahoo Categories followed by a list of matching. 
Yahoo Sites. If no matching in Yahoo Categories and Sites 
are found, Yahoo will automatically perform a Web-wide, full- 
text document search using the Inktomi Search engine. 


One can able to navigate through matches from Yahoo! 
Categories and Yahoo! Sites, Web Sites, Yahoo! News and 
Yahoo! Net Events by clicking on the links in the bar at the 
top and bottom of the page. If the searcher want to further 
customize his /her search, he/she has two options at their 
disposal: 

— go to the search options page and follow the 
instructions, or 


— specify options along with the keywords inside the query 
box using our advanced search syntax. 


Yahoo! First finds all the keyword matches and then 
sorts the results according to relevancy within each specific 
area. Yahoo! Ranks results in the following manner: 


442 Manual of Digital Libraries 


Multiple Keyword Matches: Documents matching more 
of the keywords will have a higher rank than those matching 
less. 


Document Section Weighting: Documents matching 
words found in the Title are ranked higher than those found 
in its Body or URL. 


Generality of Category : Categories matching higher 
up in the Yahoo! Tree hierarchy — more general categories 
are ranked higher than those deeper in the hierarchy — more 
narrowly focussed categories. 


Better Searching : Use Double quotes around Words 
(e.g., “aquaculture Database”) that are part of a Phrase. 


Advanced Search syntax is available within Yahoo's 
search query box. Using the syntax will allow to better tailor 
search results without having to visit the Search Option page. 
It also will allow to tap into features currently not offered on 
the search option page, There are four types of query syntax 
available: 


— Required and Prohibited Search Words, 
— Document Section Restrictions, 

— Phrase Matching, and 

— Wildcard Matching. 


Required and Prohibited Search Words : Attaching one 
of the following operators will either require or prohibit words 
from appearing in the search results. 


e Attaching a + to a Word requires that the word be found 
_ in all of the search results. 


e Attaching a- in front of a word requires that the word 
not be found in any of the search results. 


Document Section Restrictions : Attaching one of the 
following operators to the front of a search word will restrict 


Internet and Internet Resources 443 
the search to a certain document sections. 

e  t: - will restrict searches to document titles only. 

e u:-will restrict searches to document URLs only. 


Phrase Matching : Putting quotes around a set of words 
(e.g., “great barrier reef”) will only find results that match the 
words in that exact sequence. This is known as pharse 
matching. 


Wildcard Matching (*) : Attaching a * to the right-hand 
side of a word (e.g.. cap*) will return left side partial matches. 


Besides, one can combine any of the query syntax as 
long as the syntax is combined in the proper order. The proper 
order for using the syntax is the same order that the operators 
are listed on this page. Thatis. +, -, t:, u:, “” and lastly *. (e.g., 
(correct) + t:football -American versus (incorrect) t: + football 
-American) 


One can restrict his/her search to documents that are 
more recent than a certain date. But there is no search query 
syntax available for restriction by time. This feature can only 
be accessed form the search options page. 


One can also customize the number of search results 
displayed on all result pages following the summary page. 
However, there is no search query syntax option available 
for customizing the number of displayed results. This feature 
can only be accessed form the search options page. 


Alta Vista search engine helps to find Web pages, news, 
discussions, products, images, videos or audio clips on the 
World Wide Web. Here is how it works? One has to tell the 
search service what he/she looking for by typing in keywords, 
phrases, or questions in the search box. The search service 
responds by giving a list of all the Web pages, news, 
discussions and products in index relating to those topics. 
The most relevant contents will appear at the top of the results 


444 Manual of Digital Libraries 


The Alta Vista search box contains the following 
elements: 


Text Entry Box : This is where one can type the words 
related to the information looking for. He/she can type the 
inquiry in the form of a question, a statement, a phrase, or 
could just list a few words related to what you are trying to 
find. 


Search Tips and Examples : Under the text entry box 
is a tip or an example on how to use Alta Vista search more 
effectively. Tips and examples are changed regularly so be 
sure to notice it each time one visit to enhance his/her search 
experience. 


Language Drop-down Menu : Using the language drop- 
down menu, one can restrict information written only in the 
language as specify. Currently, 25 languages are supported. 


Find Results On: The round buttons below the search 
box are used to indicate where you would like to search for 
information. The default setting in the Web meaning that 
AltaVista will search all of the World Wide Web to find results. 
one may choose to search only in news items, only in the 
shopping areas, or only in discussion groups, depending on 
the information looking for. 


Tabs : Above the search box, there are three labelled 
tabs. The first tab is for Search. The second is for the 
Advanced Search, and the third tab is for Images and Audio 
& Video. If you are looking for multimedia instead of textual 
information, you can search this area of the Web also. 


Besides, the search box provides links to the Help page, 
to the Family Filter setup page and to the page where you 
can specify your Alta Vista language settings. 


To start Search, go through following steps 


/nternet and Internet Resources 445 


— Type the keyword(s) of what one is looking for in the 
search box. 


— Use the buttons below the search box to indicate 
whether he/she want to search the Web, news, 
shopping or discussion items. The default is the Web. 


— If one wish to restrict his/her search to a specific 
language, he/she has to choose the desired language 
form the dropdown menu. 


— Click the Search button to begin the search. Some of 
the basic helps are as follows. 


What is an Index? : The index is a large yet growing, 
organized collection of Web pages and discussion group 
pages from around the world. The index becomes larger every 
day as people send the addressed of new Web pages. It has 
technology that looks on its own for links to new pages on the 
Web. When you use your search service, you are searching 
the entire collection using keywords or phrases. 


What is a Phrase? : A Phrase is a group of words or 
numbers linked together. Phrases is used when you want 
specific words or numbers to appear together in your result 
pages. If you want to find an exact phrase, use “quotation 
marks” around the phrase when you enter words in the search 


box. 

You can also create phrases using punctuation or 
special characters such as dashes, underscore lines, 
commas, slashes, or dots. Some examples are: 

Try Natural Language Queries: Type a phrase or a 
question such as “Where can | find a database on 
aquaculture?” 

Use Exact Phrases : If one knows an exact phrase, 
he/she has to put the phrase in quotes, for example, “air 
pollution”. 


446 Manual of Digital Libraries 


Search for Web Pages in a Specific Language: Using 
the Language pull-down menu in the search box, it can find 
all the documents on the Web about a given topic, written 
only in the language specified. This type of search excludes 
Web sites written in any other language. But this feature is 
only available for web page searches. If one select the French 
in the language dropdown menu when searching for escargot, 
it will see pages written in French only and including the word 
escargot. 


Use Lowercase Text in your Searches: When you use 
lowercase text, the search service finds both uppercase and 
lowercase results. When you use uppercase text, the search 
service only finds uppercase results. 


For example, when search for California, it will find 
California, California, and CALIFORNIA in the result pages. 
However, when search for California in upper case, it will only 
see CALIFORNIA in the result pages. 


Include or Exclude Words: To make sure that a word 
is always included in search, place a plus sign (+) immediately 
before the keyword (no spaces ) in the search box. To make 
sure that a word is always excluded from the search, place a 
minus sigh (-) immediately before the keyword (no spaces) 
in the search box. 


As for Example, to find recipes for chocolate cookies 
without chips, try recipe cookie + chocolate-chips. 


Use Wildcards : By typing an asterisk (*) at the end of 
a keyword, it can search for multiple forms of the word. Try 
big*, to find big, bigger, biggest, and bigwig. 


Use Special Characters and Punctuation: Alta Vista 
Search defines a word as any combination of letters and 
numbers that is separated by any of the following: 


White space, such as spaces, tabs, line ends, or the 


Internet and Internet Resources 447 


start or end of a document. Special characters and 
punctuation, such as %, $, /, #, and _. 


AltaVista interprets punctuation as a separator for 
words. Placing punctuation or a special character between 
each word — with no spaces between the characters and the 
words, is another way to indicate a phrase. Example: Entering 
Jean-Luc Picard is easier than entering “Jean Luc” Picard, 
which is acceptable, but requires more keystrokes. 


Hyphenated words, such as x-files, are also considered 
phrases because of the hyphen. If we use special characters 
to indicate phrases, be careful to avid *, +, and -, since they 
perform unique functions. Besides, you may decide ably to 
use double quotes in the phrases to avoid confusion. 


Additionally, AltaVista searches more than just text. You 
can use special keywords to search for matches in such things 
as page titles or domains. 


The major tips are summarized in Table 5.1 below: 


Table 5.1. Features for Typical Searches 


T —————— eee 
Search Tips Yahoo AltaVista Google 


„nL a 
Included word + + T 


Excluded word - - - 


Title only t: - - 
URL only U: - Site: 
Phrase Match or 7 ig 
Wildcard Match e . - 
Combination All All All 


DOO O 

Google search engine is also there which almost work 
in similar fashion. There are also a number of search engines 
including Meta Search Engines to search Internet. The search 
commands and strategies are more or less different than each 


448 Manual of Digital Libraries 


other. Search logic is most important to surf Internet, Before 
selecting the search engine, one should know its search 
techniques, logic of searching, subject covered etc. for an 
effective search results. 


5.5. INTERNET IN LIBRARIES AND INFORMATION 
CENTRES 


Internet is the gateway for libraries and information 
centres to enter in the electronic information era. Internet 
provides the information generated by the different 
organizations, institutions, research centres and individuals, 
all over the world, Some of the important resources available 
on Internet include E-Journals, Preprinted Materials, 
Bibliographical tools, Books, Dictionaries, Encyclopedias, 
Directories, Reports, Patents, Standards, Library catalogues, 
Newspapers, Magazines, Databases, Files, Audio, Video, and 
Home pages of companies, institutions, organizations, 
associations, and individuals, experts, etc. The coming days 
are the days of the electronic era. Since information is moving 
in around us, we have to catch it and make it available to our 
users. It can only be through the Internet. Every library and 
information centres available in our country should be 
attached with Internet and should create a home page of the 
library and its holdings so that we may stand on par with next 
of world and able to enter into the digital information era. 


It is seen since early 1995 privatization of the Internet 
opportunities for connectivity have expanded considerably 
with fierce competition in the commercial sector to market 
internet services to business purposes. The involvement of 
Librarian and other information professionals in training for 
internet use and in the organization of Information to be used 
on the internet has also increased considerably. 


Locating information through the internet remains a 
challenge among the Librarians. The reference Librarian must 


Internet and Internet Resources 449 


continue to develop their role as “intermediaries” in the 
information seeking process. Information professionals have 
a keen interest in the development software tools. Time is 
not far away that the current structure of libraries are turned 
out to be electronic libraries. The use of internet as a 
communication and a current awareness tool will enable 
librarian to accomplish more and faster. E-mail is the most 
single popular means of meeting a large number of 
knowledgeable people across the expense of the Plannet. 


List serve electronic discussion group usually focus on 
a specific topic. There are many specific library related topic. 
List serves faciliate on going daily discussion about numerous 
issues and they are most useful for finding a solution to a 
problem when the in-house expertize is not enough. The 
British Library Association has recommended that Libraries 
should:- 


— Use their skills to identify information whether in text 
image, or sound and route it as appropriate to people 
who need it. 


— Provide network access points free or charged as 
appropriate and provide opportunities for education and 
training in the use of the network. 


— Use open information systems and communication to 
integrate use of the network with main stream Library 
Services. 


— Publish appropriate information, eg., Catalogue etc. 


— Apply their skills to the management of the vast amounts 
of information and archives over the network. 


Thus, Librarians find a unique and useful place in the 
field of information technology. While the internet access 
provides enormous amount of information thousands of areas 
of their choice, it is the responsibility of the Librarians to devise 


450 Manual of Digital Libraries 


ways and means to arrange them systematically for easy 
retrieval. They must also help the freshers to internet. Their 
major role would be as an information provider and designer 
to internet. The traditional enemies of Librariani.e., synonyms, 
antonyms, and homonyms will play a key role here. It is the 
duty of the Librarians to control them properly, so that the 
users Shall not miss any link while searching. Finding out ways 
and means for an easy access to a specific or minute 
information from the ocean of knowledge that is available in 
the internet is a new area of study that needs much attention 
in the modern world. 


Now the users are exposed to a Global Library. The 
bibliographic data of the libraries of the entire world is in our 
fingertips. Internet is playing an important role in discharging 
the function of libraries. It is changing the ways of organizing, 
managing and disseminating the information. With more 
documents getting published electronically and internet 
resources growing fast. Libraries of 21st century have to shift 
towards electronic means of acquiring, processing and 
disseminating the information. Internet is a bone of information 
profession where main aim is to provide information to their 
clientele. It is greatly influencing the practice of librarianship. 


Today all sorts of library services from membership 
registration to document delivery can be offered through the 
internet. Some of the important library services that can be 
offered through the internet are as follows: 


5.5.1. Collection Development 


Collection plays a very important role for the users of 
Internet which has given new meaning to the process of 
collection development as there is a clear shift from the 
concept of ownership to accessibility. Large numbers of 
documents are accessible on Net — few of them are available 
free and rest against payment. Extensive access to 


Internet and Internet Resources 451 


information resources has proved to be very helpful for 
financially starved libraries. With internet access the libraries 
are able to achieve economy, as they are shifting towards 
consortia approach to acquire access and subscribe the 
material they need. Acquisition of documents in e-form is 
becoming the need of the day. 


5.5.2. Acquisition of Document 


With the application of internet the acquisition process 
has become much better and many of the problems related 
with the acquisition have been solved. Todays most of the 
publishers and booksellers have their web sites on the internet 
and place their regular catalogues and leaflets of new 
publication. Some of the publishers of primary journals like 
American Chemical Society, IEEE (USA), Elsevier Science 
Publisher, are providing their journals online. The IDRC, 
Canada is providing books on research and development that 
can be ordered online through the URL http://www..idrc.co. It 
also publishes its best reports online which are also available 
at web site http://www.idrc.co. CAB publishing has recently 
launched a series of subject online communication catering 
to the needs of librarians and researchers; each community 
features a comprehensive abstract database with 25 years 
archive. Examples of some of the useful set of links available 
through the Internet for acquisition, are : 


— Association of learned and professional society 
publishers at http:/Avww.alsp.org.vk/member.html. 


— ingentia journals, provides access to bibliographical 
information from more than 550 journals from academic 
press, Royal Geographical Society, White House press 
and Hardwood Academic, etc. and searched without 
restriction from http://www.ingentia.com. 


— ARL Directory of Electronics journals product by 
Association of Research Libraries gives Information on 


452 Manual of Digital Libraries 


electronic journals and newsletters along with details 
of the subscription. 


— Britannica online offers the world’s first online 
encyclopedia. The libraries can provide access to the 
readers by paying some registration fee. The Britannica 
online has advantage of accessing articles not yet in 
print, and Britannica book of the year at http:// 


www.ep.com/. 


— Amazon.com books web site provides access to greater 
selection of books with over one million titles which is 
searchable by keywords, author, title or subject. The 
site also has the provision for purchase, via Netscape’s 
secure commerce server or over the phone. Its address 
is http:/Awww.amazon.com. 


The library and information professionals can easily 
browse through the current publications available on various 
web sites in their area of interest, confirm the prices, etc. and 
place the orders online. Any discrepancy in the invoices or 
bills, edition of books, printing etc. can be clarified within 
minutes through e-mail; thus much of the work is reduced. It 
is expected that in near future, the Internet will become the 
mechanism for distribution of three fourth of the specialized 
journals and also the major medium for transfer of research 
information. 


The librarians will thus need to change their attitude 
towards collection development, as the technology advances 
in future. It will encourage access to document rather than 
ownership. In future, virtual libraries may replace the 
traditional source, in physical existence. 


5.5.3. Technical Processing 


The preparation of standard catalogue without much 
effort has become possible due to Internet and the World 


Internet and Internet Resources 453 


Wide Web. Librarians can check-the catalogues of other 
libraries like that of Library of Congress and confirm the 
information required for a record which can be easily 
ascertained from the original document. The Library 
professionals can also access Internet resources for 
verification and downloading the bibliographical information 
from other institution OPACs via Internet which have become 
a popular source of bibliographic information. Libraries can 
make use of other institutions by OPACs to get information 
they need to organize knowledge. Databases of bibliographic 
utilizes will become more comprehensive source of 
information than has been so far possible by their present 
catalogues. With advance information retrieval facilities, the 
libraries in future will have added value by using catalogue of 
journal articles. 


The electronic documents can be supplied to the users 
on demand through the network. According to Schmidt, 
“access to OPACs will be increasing from outside the library. 
The boundaries between the cataloguing of libraries holding 
and cataloguing of information will be more difficult than today, 
in my opinion they will vanish completely when networks have 
reached a certain technical capacity.” 


Internet has also affected the traditional classification 
system of our libraries. Several libraries are opting the cyber 
Dewey Decimal Classification Summaries as a way to 
organize and navigate resources on the www. The Cyber 
Dewey website includes alphabetical index to Dewey. The 
Dewey home page (http:// www.oclc.org/pp) division contains 
links to some of these systems. Joan Mitchell, the editor of 
DDC says, “it is exciting time to be Dewey user because we 
have a commitment to keep pace with knowledge to help our 
users classify efficiency and help our users extend from the 
shelves of their libraries into the electronic environment.” 


454 Manual of Digital Libraries 
5.5.4. Circulation 


Internet has also made the circulation of in house 
document much easier. After the technical processing the 
new books document can be placed in the OPAC on the day ° 
of acquisition itself and readers with Internet connection at 
home or at university can browse and reserve the books sitting 
at their offices or at home, within seconds after arrival of the 
book in the library. 


Further libraries subscribing to electronic journals need 
not necessarily provide with a user ID. The reader by enquiring 
the user ID from the circulation section can access the journals 
from their departments or offices without taking pain of visiting 
the library. 


Through Internet the libraries can also access 
bibliographical databases via OPACs from libraries of other 
institutions worldwide. The OPAC may be searched from a 
terminal located at a remote place. Besides the electronic 
documents required by the readers can also be supplied on 
demand through the network. 


5.5.5. Information and Reference Service 


Current Awareness Service (CAS) and Selective 
Dissemination of Information (SDI) services are the most 
useful services of any good library. Internet is playing a very 
significant role in providing CAS/SDI services to its users. It 
has widened our information resources base extensively by 
providing access to global information. One can access 
abstracts, citations, bibliographic and full-text databases, 
library OPACs or other sites wherever the information is 
available. 


Libraries using Internet can provide for better 
information services, much wider in scope at minimum cost 
and time. Reference sources like encyclopedia, dictionary, 


Internet and Internet Resources 455 


directories, bibliographies, index/abstract, gazetteers, and 
maps are available with up-to-date information. 


5.5.6. Resource Sharing 


Resource sharing has become an important facility due 
to multiplying cost of material, where Internet is being used 
heavily. Through internet, users of one library can know what 
is available in the collection of other libraries. It creates a 
cooperative network that is very useful for fund starved 
libraries. Under this programme all networked libraries make 
their resources available on the Net to be used by other 
libraries. 


5.5.7. Inter-Library Loan (ILL) 


The traditional inter-library loan operations are quite 
time consuming and labour intensive. With the advent of 
technology, the electronic documents and various inter-library 
management tools such as software like Ariel and Avis have 
facilitated the libraries to share their resources effectively and 
efficiently. 


Ariel software opens the window on internet document 
transmission. The Ariel workstation developed by Research 
Libraries group lets users send and receive crisp clear copies 
of document over the Internet with speed and ease of fax. 
Avis is another Canadian product developed at the University 
of Waterloo and refined with the cooperation of inter-library 
loan practitioners in libraries across Canada and USA. Avis 
is PC based software designed to manage all aspect of inter- 
library loan process. The inter-library loan office can network 
multiple Avis workstation on local area network. It offers the 
following benefits:- 


e Single comprehensive solution for the management of 
all ILL activities. 


456 Manual of Digital Libraries 


e Effective management of the paper work and record 
keeping acquired in borrowing and lending an item. 


e Status tracking of request at all stages of the ILL 
process. 


e Integration of bibliographic and location information 
from CD-ROM catalogue and online union catalogue. 


e Transparent electronic transmission of requests and 
messages through the Internet. 


Thus with the help of these software, inter-library loan 
the Internet has become of great help in through lending and 
borrowing. Retrieval has become easier and transaction much 
quicker as the request can be sent through e-mail. Smal! 
documents can also be sent as attachment with e-mails if 
they are in digital formats. 


5.5.8. Communication 


Internet has become the primary mode of 
communication which carries more information than the 
combined total of the postal services of all countries in the 
world. It is an important means of communication which 
provides a cheap and efficient means of mail transfer. 
Libraries can use this facility extensively to communicate with 
the publishers, book sellers and vendors of the other library 
products and services, with scholars librarians and users 
across the globe. The most popular means of commutation 
on the Internet is e-mail, like the regular mail; there are also 
mailing lists to user groups of people. These mailing list often 
called listservs, can serve a valuable resource for the 
librarians. A more public electronic forum for discussion on 
the internet is called the Usenet News. Usenet provides 
information on large numbers of news groups or conference 
that have open participation which can be used by the library 
users and library professionals. 


Internet and Internet Resources 457 


Thus, Internet is becoming an important tool for in- 
house library services. 


5.9. INTERNET RESOURCES 


Various types of internet resources are available on 
internet ranging from Arts & Humanities to Social Sciences 
and Sciences - comprising of pure and applied nature. 
However, their depth and quantum may vary from area to 
area. 


Here, an attempt is being made to present some of the 
resources in Arts & Humanities, Social Sciences and Science 
& Technology discipline. In no way, the listing is not complete 
but it is a comprehensive one. 


Have a look firstly on internet resources in Arts & 
Humanities. 


5.9.1. Internet Resources in Arts & Humanities 


Once you are hooked to the Internet, you have instant 
access to an almost indescribable wealth of information. You 
have to pay for some of it for sure, but most of it is available 
free. In the area of Arts & Humanities, there are a whole lot of 
multifaceted resources available for all categories of people. 
What one needs is to know the structure of these resources. 
An attempt is made to take a stock of what is available in the 
area of Humanities via Internet and supplement the 
information with suitable examples of resources available and 
accessible via Internet. 


(A) By Source Type 

The term Humanities is used to signify all those 
branches of learning which are not classified as science, either 
natural sciences or social science. It is the study of literature, 
music, history as distinct from social or natural science. There 
are many types of information resources available, but here, 


458 Manual of Digital Libraries 


the sources here have been categorized on the basis of the 
source type that is the format. There are several types of 
sources available such as: 


e Electronic Journals 

e Current Contents of Journals 

e Books and Library Collections 

e Other Online Text and News Sources 

e On-Line Writing Guides/Labs 

e Indexing and Abstracting Service Resources 
e Subject Databases 

e Commercial Databases 

e Data Archives/Data Services 

e Document Delivery 


e Course Reserves/Teaching Resources/Training 
Materials 


e Discussion Lists or Forums/Usenet Newsgroups/ 
Mailing Lists 


e Research Resources / Research Projects 
e Events and Conferences 
e Online Multimedia Projects and Exhibits 


e Campus Wide Information Systems (CWIS)/University 
Departments 


e Dissertations 

e Software Archives 
e Reports 

e Data Centres 

e Working Papers 


Internet and Internet Resources 459 
e Reference Sources 


° Directories 


ELECTRONIC JOURNALS 


There are available various electronic journals on 
Internet. Some important of them are listed below. 


Technical Documentation from HW Wilson - 
Humanities Index/Abstracts with Full Text Journal List (http:/ 
/www.hwwilson.com/journals/ ahum.htm) : It is good source 
for tracking journals in the Humanities area. Total number of 
journals, excluding name changes: Humanities Index/ 
Abstracts with Full Text is 506. This number represents active, 
ceased, dropped journal titles. 


Humanities Journal-UCSD Libraries (http:// 
libraries.ucsd.edu/sage/ejournals/language 
and _literature.html) : A large number of e-Journals in the 
area of Humanities and Arts are accessible from this site. 


Yahoo! Directory-Humanities Journals (http:// 
dir.yahoo.com/Arts/_ Humanities/Journals/) : It includes 
journal resources by subject area. In the Humanities group, 
categories here include: Classics, Cultural Studies, History, 
Literature and Philosophy. 


CARL UNCOVER (http://www.ingenta.com/) : Access 
to CARL UnCover is now available through the Web at 
ingenta. CARL is a computerized network of library services 
developed by the Colorado Alliance of Research Libraries. 
CARL UnCover is the Alliance’s index to journals and 
magazines. From ingenta’s homepage: Since acquiring 
UnCover in 2000, ingenta has been working to integrate the 
two databases in order to provide a more comprehensive and 
easy-to-use service. Now live, the integrated service offers 
free searching and browsing of more than 25,000 publications 
with 11,000 titles that were not available in UnCover. In 


460 Manual of Digital Libraries 


addition, a number of new services have been introduced. 
Searching ingenta is free. Article delivery is also available, 
for a fee. One can arrange to have articles faxed, delivered, 
or sent electronically — HTML or PDF format directly. To 
choose this service, you will have to provide a personal credit 
card number and the articles will be charged to that account. 


Project Muse (http://muse.jhu.edu/) : Full text access 
to more than forty scholarly journals in the humanities, social 
sciences, and maths is available here. 


Anthropoetics: The Electronic Journal of Generative 
Anthropology (http:// www.humnet.ucla.edu/humnet/ 
anthropoetics/home. html) : Generative anthropology (GA) 
is anew mode of critical thinking that applies the criterion of 
intellectual parsimony to the study of cultural phenomena. 
Anthropoetics is dedicated to rethinking both the impasse 
between the humanities, imprisoned in the “always already” 
of our cultural systems, and the empirical social sciences. 


Australian Humanities Review (AHR) (http:// 
www.lib.latrobe.edu.au/AHR/) : AHR is a peer-reviewed 
interdisciplinary electronic journal published quarterly with 
regular updates every two weeks. 


Postmodern Culture (http://jefferson.village.virginia. 
edu/pmc/ contents.all.html) : It is a journal in both print and 
electronic form. PMC contains critical essays, creative work, 
reviews on postmodern culture. 


BOOKS AND LIBRARY COLLECTIONS 


There is huge ocean of book collections available via 
Internet. Amazon.com, the world’s largest online bookstore 
can serve you with any book that you are looking for. You 
can even place orders for obtaining them. Besides this 
bookstore, there are several other collections, and here we 
are specifically referring to collections in the area of 
Humanities. Examples include: 


Internet and Internet Resources 461 


Books and Library Collections for Academics (BFA) 
(http://www.ex.ac.uk/ bfa/home.htm) : Hosted by the 
University of Exeter, BFA aims to link the academic 
community directly to academic publishers’ online catalogues. 
Users can browse any number of online catalogues while 
retaining the BFA Web site navigation frame, or can perform 
searches by publisher, by subject/category, or keyword. 
Information about editorial contacts at publishers are also to 
be included in future, as will be a reviews page and links to 
libraries, museums, online journals, newspapers, etc. 


On-line Books Page (http://www.cs.cmu.edu/ 
books.html) : It is an index to over 1200 on-line books and 


other documents, as well as directories and archives of on- 
line texts and exhibits. 


Books Online (http://onlinebooks.|library.upenn.edu/): 
It facilitates search of 17,000+documents , using author, title, 


subject search parameters. Besides some serial publications 
are also listed here. 


On-line Books Page presents BANNED BOOKS ON- 


LINE (http:// www.cs.cmu.edu/People/spok/banned- 
books.html) : As the title implies, this site gives a list of banned 


books due to various reasons. This is a good source for 
identifying books that are banned. 


New York Public Library Digital Library Collections 
(http://digital.nvpl.org/) : It provides access to great collection 
of Humanities books. 

Russian Studies Materials (http:// 


www.departments.bucknell.edu/russian/ material.htmil) : It 
is maintained by Bucknell University, and this site provides 
links to Russian art, history, language, literature, music, 


philosophy and religion. 


462 Manual of Digital Libraries 
OTHER ONLINE TEXT AND NEWS SOURCES 


Such resources generally comprise Online text in the 
field of literature and languages, besides being the source 
for other news items. Examples include: 


The Humanities Text Intitiative (HTI) (http:// 
www.hti.umich.edu/) : The Humanities Text Initiative (HTI), 
housed within ETS, is a mechanism for the creation and 
delivery of electronic texts in SGML for the Internet 
community. Texts created or served by the HTI are freely 
available for all users. HTI is a unit of the University of 
Michigan’s Digital Library Production Service, which is 
providing online access to full text resources since 1994. HTI 
is an umbrella organization for the creation, delivery, and 
maintenance of electronic texts, as well as a mechanism for 
furthering the library community’s capabilities in the area of 
online text. The collections on this site are freely available to 
the Internet community. Resources which are restricted to 
use by University faculty, staff, and students only can be found 
at the Encoded Text Services (ETS) which is a unit of the 
University of Michigan Digital Library Production Service 
(DLPS). ETS delivers licensed full-text collections in Standard 
Generalized Markup Language (SGML) via the web for the 
University of Michigan community Encoded Text Services 
website. Encoded Text Services (ETS) is a unit of the 
University of Michigan Digital Library Production Service 
(DLPS). The HT! collaborates with the other units of the 
University of Michigan library to select texts for conversion, 
create metadata to describe the electronic text and the source 
document, and deliver the material via the World Wide Web. 
The HT! also has partnerships with a number of other groups 
and institutions for creation and delivery electronic resources 
including an online journal of book reviews, a catalogue of 
electronic texts available via the Internet, and a linguistics 
database, as well as the more familiar collections of poetry 
and prose. 


Internet and Internet Resources 463 


The American Verse Project (http:// 
www.hti.umich.edu/a/amverse/) : It is a collaborative project 
between the University of Michigan Humanities Text Initiative 
(HTI) and the University of Michigan Press. The project is 
assembling an electronic archive of volumes of American 
poetry prior to 1920. The full text of each volume of poetry is 
converted into digital form and coded in Standard Generalized 
Mark-up Language (SGML) using the TEI Guidelines, with 
various forms of access provided through the WWW. In 
recognition of the effort involved in selecting, editing, 
encoding, and maintaining online the works included in the 
archive, the users are expected to abide by the conditions of 
use. The site facilitates: 


Simple Searches: Single word and phrase searches 
throughout the entire corpus. 


Proximity Searches: Find the co-occurrence of two or 
three words or phrases. 


Boolean Searches: Find combinations of two or three 
words in a given paragraph or verse. 


Word Index: Browse through lists of all unique words in 
the texts. 


Citation Searches: \dentify works by author and title. 


Project Bartleby (http://www.columbia.edu/acis/ 
bartleby/index.html) : This is a Columbia University’s online 
publishing project, which includes full text of selected poetry, 
Bartlett's familiar Quotations, inaugural addresses of U.S. 
Presidents and more are available here. Texts can be 
searched by words or phrases. 


ALEX (http://sunsite.berkeley.edu/alex/) : Alex was 
originally conceived by Hunter Monroe in 1993-94 as a 


catalogue and access point to electronic texts. The catalogue 
contains roughly 2,000 texts mostly online. The catalogue is 


464 Manual of Digital Libraries 


accessed by author, date, host, language, subject, or title. 
This site also contains a section with information about 
cataloging Internet resources. 


Arts Wire CURRENT (http://artswire.org/Artswire/ 
www/current.html) : Arts Wire CURRENT features news 
updates on social, economic, philosophical, and political 
issues affecting the arts and culture. 


Artvoice.com (http://www.artvoice.com/) : Buffalo New 
York’s Alternative Arts News, features— articles, visuals and 
events listings on poetry, fiction, film, theatre, jazz and 
galleries. 


BBC Education (http://www.bbc.co.uk/home/today/ 
index.shtml) and (http:// www.bbc.co.uk/learning/colleges): 
BBC education has lots of practical advice on how to make 
the most of your work, study and life for every one whether 
you are doing school level education, going to college, leaving 
for university or joining the world of work. Visit the Subject 
Listing for all of learning sites listed by subject. From articles 
written by leading academics to interactive games, just click 
on a subject to explore. The user can discover a new way of 
learning with AS Guru, Study English, Maths, Biology and 
General Studies with the site’s unique combination of 
interactive assignments, TV and printed material. Also 
provides links to WebGuide for an impartial guide to the best 
educational websites on the internet. 


Cultural Resource Management (CRM) (http:// 
www.cr.nps.gov/crm/crm-hom.htm) : CRM is published by 
the National Park Service. This site contains information and 
articles on the history, significance and issues around park 
resources, preservations and related local, national and 
international initiatives, training and conferences. 


National Public Radio (http://www.npr.org/) : National 
Public Radio was founded on February 24, 1970, with 90 


Internet and Internet Resources 465 


public radio stations as charter members. Today, NPR serves 
a growing audience of more than 15 million Americans each 
week via 620 public radio stations and the Internet and in 
Europe, Asia, Australia and Africa via NPR Worldwide, to 
military installations overseas via American Forces Network, 
and throughout Japan via cable. In its 30 years, NPR has 
won every major award in journalism for news and cultural 
programming in America. 


CURRENT CONTENTS OF JOURNALS 


Current contents of journals are the valuable resources 
that help users in multi-pronged ways. For instance these 
resources help users to stay up-to-date in their research and 
provide a complete picture of today’s global research in arts 
and humanities by combining comprehensive coverage with 
numerous access optional coverage of past research as ISI’s 
Current Contents/Arts and Humanities. Above all such 
resources are an excellent aid in saving research time. ~ 
Examples include: 

Current Contents/ Arts and Humanities (http:// 
www.isinet.com/isi/products/ cc/editions/ccah/) : ISI Current 
Contents: Arts and Humanities provides access to complete 
bibliographic information from articles, editorials, meeting 
abstracts, commentaries, and all other significant items in 
recently published editions of over 1,120 of the world’s leading 
arts and humanities journals and books from a broad range 
of categories. The source besides facilitating regular features 
of such service, has provision for combining comprehensive 
coverage with numerous access points, exclusive search 

capabilities; optional coverage of past research; research time 
by providing one source for a variety of research data — 
including author abstracts, author addresses, and more 
information per bibliographic record than other resources do. 


California Digital Library-Current Contents (http:// 


466 Manual of Digital Libraries 


www.cdlib.org/cgi-bin/search_title?title=Current+Contents): 
The Current Contents Article database (CC) contains records 
for journal articles in the arts, humanities, physical, social, 
and biological sciences, and other fields. Article citations in 
the CC database are indexed by the Institute for Scientific 
Information, Inc. The database includes citations indexed from 
July, 1989 to the present, representing publication dates since 
early 1989. It is also possible to search by title for periodicals 
held by the UC campuses and other California libraries. 


Current Contents Search (http://library.dialog.com/ 
bluesheets/html/bl0440.html) : It is weekly service that 
reproduces the tables of contents from current issues of 
leading journals in the arts and humanities, social sciences, 
and sciences. 


Current Contents / Arts & Humanities 
(www.garfield.library.upenn.edu/ essays/v4p009y1979- 
80.pdf) : Current Contents : Arts & Humanities is a new service 
initiated to keep the users in this subject area up-to-date. The 
service was started in 1979 and at that time itself it covered 
almost 1000 journals in various areas of Arts and Humanities. 


E-ZINES 


E-Zines are electronic magazines. There are several 
such resources available on the Internet. Examples include: 


Eserver.org:(http://eserver.org/) : It is a searchable 
resource for humanities from Carnegie Mellon University. It 
includes a list of links pages that contain upwards of 30 links 
each. Also provides links to some on-line e-zines. Some other 
include— OneWorld Magazine (http://www.envirolink.org/ 
oneworld/toc.html); (Re)Soundings (http://www.millersv.edu/ 
“resound/); and Slate (http://slate.msn.com/). 


ON-LINE WRITING GUIDES/LABS 


Such resources are very valuable for the Humanities 


Internet and Internet Resources 467 


area from the point of providing a ready reference tool that 
provides tips and/or advice for writing various types of essays, 
constructing an argument, and the like. Examples include: 


Australian Defence Force Academy’s Essay Writing 
Guide (http://www.pol.adfa.oz.au/resources/essay_writing/ 
contents.html) : It is a thorough, indexed handbook to writing 
essays with helpful explanations and advice. Resource 
references do not contain hyperlinks which means you need 
to note the site addresses and use those addresses for using 
the document online. 


University of California at San Diego Revelle School 
for Humanities Writing Resources (http://iacs5.ucsd.edu/ 


~ 


hu3f/writing/writing.html) : It is a drop-menu of internal 
(UCSD) documents giving advice on various subjects from 


constructing an argument and supportive evidence to word 
choice. Also has external links to writing links. 


Paradigm On-line Writing Assistant (http:// 
www.powa.org/) : It is a thorough site including information 
on how to write argumentative essays, personal essays 
including nonfiction narrative, and exploratory essays. 


Purdue On-line Writing Lab (http:// 


owl.english.purdue.edu/) : The main appeal to this page is 
its “handouts” section, indexed by topic, that includes 


exercises and answer keys to supplement the guidelines. This 
site also contains links to several other writing resources and 
is a good starting point for writing. The design is clean and 
simple. 

University of Kansas Writing Resources (http:// 


www.ukans.edu/~writing/ resources.html) : It is an extensive 


index of topics and helpful documents. 


Claremont Graduate University’s Writing Centre (http:/ 


/www.cqgu.edu/resources/Wrtctr/Resources/) : Elementary 
explanation of some of the fundamentals of writing in the 


468 Manual of Digital Libraries 
humanities is available on this site. 
Writing Argumentative Essays (http:// 


www.esliplanet.com/teachertools/ argueweb/frntpage.htm): 


It is a thorough guide for writing argumentative essays. 


Writing a Basic Essay (http://members.tripod.com/ 
“Iklivingston/essay/) : It is a guide to writing a very basic 
essay. May also be useful if you are feeling absolutely lost 
and need to bone up on the basics. 


Colorado State University’s Writing Centre (http:// 
www.colostate.edu/ Depts/WritingCenter/reference.htm) : 
It is a highly organized guided approach to writing, reading 
and speaking/presenting for academics. 


Guide to Grammar and Writing (http:// 
webster.commnpt.edu/HP/pages/ darling/grammar.htm) : It 
is an on-line grammar guide with indexed topics and 
searchable database. It includes a thorough guide to writing 
on the sentence, paragraph and essay levels. 


University of Illinois at Urbana-Champaign Writing 


Techniques Handbook (http://www.english.uiuc.edu/cws/ 
wworkshop/techniquesmenu.html) : It has useful guidelines 


for topics from writing about film, writing about poetry and 
writing a thesis to how to overcome writer’s block. The links 
page may be particularly helpful, including a link to UIUC’s 
Hypertext guide and several other writing labs. 


University of Wisconsin On-line Writers’ Handbook 
(http://www.wisc.edu/ writetest/Handbook/) : It provides 
helpful and detailed tips and information on how to write all 
sorts of documents including personal essays and 
statements. Also provides a link to other useful resource sites 
such as - a list of American university writing labs (http:// 
owl.english.purdue.edu/owls/writing-labs.html); a list of 
Canadian universities alphabetical and by province (http:// 
www.uwaterloo.ca/canu/); and a list of American universities 


Internet and Internet Resources 469 


by state (http://www.utexas.edu/world/ univ/state/). 


INDEXING AND ABSTRACTING SERVICE RESOURCES 


Indexing and abstracting resources provide a rich 
resource-base in various subject areas. These resources 
provides link to specified resource types available on the 
Internet. These services facilitate the same way as the printed 
indexing and abstracting resources do with the difference that 
while printed indexing and abstracting resources guide you 
to specified resources that you need to look for and get from 
any library or information center, the online sources provide 
you option to obtain the source document with just a click of 
the mouse by selecting the link. However the document may 
be available for free or against fee. There are several such 
resources available Via Internet. Examples include: 


Arts and Humanities Citation Index (http:// 
www.webofscience.com/) : This is a part of the Web of 
Science, that is available from 1975 onwards. The database 
provides author, keyword, and cited reference searching from 
more than 1100 leading arts and humanities journals. In many 
cases, it includes searchable abstracts plus links to e-journal 
images. 


Wilson Humanities Abstracts (http:// 
www.silverplatter.com/catalog/ whab.htm) : Wilson 
Humanities Abstracts is from H.W. Wilson Company, that 
provides comprehensive abstracting from 1994 and indexing 
from 1984 of 500 English-language periodicals covering the 
areas of archaeology, classical studies, art, performing arts, 
philosophy, history, music, linguistics, literature, and religion. 
Anessential reference tool for serious research in the diverse 
subject areas of the humanities, Humanities Abstracts opens 
doors to a wealth of specialized information and is ideal for 
graduate students and professionals focusing on complex 
research projects and librarians needing quick answers to 


470 Manual of Digital Libraries 


specific questions. It has no Online Equivalent, while its print 
equivalent is Humanities Index. 


SUBJECT DATABASES 


The subject databases provide comprehensive 
information on a specified subject area. There are large 
number of subject databases available on the Internet, varying 
in their scope, content and coverage. There are also 
databases of databases available on the Internet. Because 
of the very nature of the Internet, providing just link to various 
related resources makes a site a mega site thereby itself 
become a one stop shop for looking for information, on a 
particular subject area. This is contrary to printed resources 
where the publisher needs to collate, organize and inciude 
information in one publication and finally make it available as 
a printed document which is economically also less feasible 
besides being cumbersome. Examples include: ; 


The OCLC FirstSearch (http://www.oclc.org/ 
firstsearch/) : The OCLC FirstSearch service connects a world 
of libraries to a universe of information. FirstSearch gives 
library users instant online access to more than 70 databases, 
including these valuable OCLC databases— OCLC WorldCat, 
OCLC FirstSearch Electronic Collections Online, OCLC 
ArticleFirst, OCLC PAIS International, OCLC PanersFirst, 
OCLC ProceedingsFirst, and OCLC Union Lists of 
Periodicals. Best of all, library holdings are displayed up front, 
so users can easily identify items in their own library’s 
collection. OCLC FirstSearch is a comprehensive and 
complete reference service with a rich collection of databases 
and with links to the World Wide Web, over 10 million online 
full text articles, full-image articles from over 4,000 electronic 
journals, library holdings, and interlibrary loan. It supports 
research in a wide range of subject areas with well-known 
bibliographic and full-text databases in addition to ready- 
reference tools such as directories, almanacs and 

encyclopedias. 


Internet and Internet Resources 471 


California Digital Library-Databases (http:// 
www.isinet.com/isi/products/ cc/editions/ccah/) : The 
databases that are relevant to this area and hosted by 
California Digital Library are: Database — Area, 
Interdisciplinary, and Ethnic Studies; Database — Area, 
Interdisciplinary, and Ethnic Studies — General; Database — 
Arts and Humanities; Database — Arts and Humanities — 
General; Database — General Interest and Reference; 
Database — Databases and Indexes; Database — History; and 
Database — History — General. 


Indiana University-Purdue University Indianapolis- 
Arts and Humanities (http://www.ulib.iupui.edu/erefs/a_ 
h.html) : A huge number of databases are available in the 
area of arts and Humanities for access at this site. Some 
resources may be available only within the University Library 
building. There is a big list of databases available which also 
include -Database available on CD-ROM for in-library use 
only; Database available at public workstations in libraries 
only and Database available without restriction. 


COMMERCIAL DATABASES 


Anumber of commercial databases which are generally 
fee based are available on the Internet. These databases 
are generally put up with a definitive purpose for having a 
target user population for information access and retrieval. 
These resource-bases cover quality resources, since they 
need to be real authentic and competitive, otherwise people 
will not buy access to such resources. 


One or more of these commercial databases available 
for use by Michigan residents may also be useful when looking 
for information in this subject category. 


Ancestrys Plus available at http:// 


infotrac.galegroup.com/itweb/lom yourlocation!| Dhere?db= 
APLUS. 


472 Manual of Digital Libraries 


Art Abstracts available at http://firstsearch.oclc.org/ 
timeout=900:done=referer;dbname=ArtAbstracts; FSIP. 


Book Review Digest available at  http:// 
firstsearch.oclc.org/ timeout=900;done=referer; dbname= 
BookRevDigst:FSIP. 


Books In Print available at http://firstsearch.oclc.org/ 
timeout=900;done=referer:;dbname=Books!InPrint; FSIP. 


Gale General Reference Center Gold (InfoTrac) 
available at http://www.accessmichigan.lib.mi.us/iac-lib- 
search.htm. 

Humanities Abstracts available at hitp:// 
firstsearch.oclc.org/ timeout=900;done=referer;dbname= 
HumanitiesAbs; FSIP. 


Music Literature available at http://firstsearch.oclc.org/ 
timeout=900;done=referer:dbname=MusicLiterature; FSIP. 


Union List of Periodicals Worldcat available at http:// 


firstsearch.oclc.org/ timeout=900:done=referer:dbname 
=UnionLists; FSIP. 


DATA ARCHIVES/DATA SERVICES 


These provide access to archives and are generally 
fee based. The advantage of such resources lies in the fact 
that while the libraries are constantly under pressure for want 
of more space, these services make up for that since they 
maintain the archives at their end while facilitating you to 
access the archives generally against fee. Another important 
point is that the library really do not have to worry about 
maintenance of the documents against any odds. Thirdly, the 
access to such archives is available round the clock. However 
there are some difficulties as well— if you are discontinuing 
buying access to such services, your permission to access 

the requisite archive ceases to exist. What happens to the 
subscription that you have paid for a specified time period? 


Internet and Internet Resources 473 


while in printed document you still hold the document for the 
years for which you have paid the subscription and acquired 
the document. However in the online set-up many models 
are now being proposed and marketed by the data archiving 
and marketing agencies that vary from agency to agency, 
which needs attention. Examples include: 


JSTOR (JSTOR.org) : It provides the full texts of articles 
relating to ecology, economics, education, finance, history, 
mathematics, political science, and population studies. The 
resources are listed by subject and by title also. 


Arts and Humanities Data Services (http:// | 


www.ahds.ac.uk/) : The Arts and Humanities Data Services 
(AHDS ) is a UK national service funded by the Joint 
Information Systems Committee and the Arts and Humanities 
Research Board. It is organised via an Executive at King’s 
College London, and five service providers from various 
Higher Education institutions. AHDS helps you to create, 
deposit, preserve or discover and use digital collections in 
the arts and humanities. On this site you can find out about 
the work of the AHDS, participate in their training events, 
consult their publications, and search their wide-ranging 
collections. To visit the ADHS collections, you need to select 
from their subject teams as indicated below: 


— Archaeology Data Service 
(http://ads.ahds.ac.uk/welcome.htm!) 

— Visual Arts Data Service 
(http://vads.ahds.ac.uk/) 

— Oxford Text Archive 
(http://ota.ahds.ac.uk/) 

— Performing Arts Data Service 
(http://www.pads.ahds.ac.uk/) 

— History Data Service 
(http://hds.essex.ac.uk/) 


474 Manual of Digital Libraries 


Besides, Data Archives-Political Studies Association 
(http://Awww.psa.ac.uk/www/archives.htm) is there. It provides 
links to several data archives, such as Archives Hub (http:// 
www.natcen.ac.uk/cass/), which is a JISC funded gateway 
to descriptions of archives in UK universities and colleges. 


DOCUMENT DELIVERY 


ISI Document Solution (http://www.isinet.com/isi/ 
products/ids/ids/index.html), is adocument delivery service, 
provides access to full-text items from virtually any publication 
within or outside of the ISI database. This flexible, fast, and 
convenient service provides wide coverage, reliable customer 
support, fast and varied delivery options, and efficient 
management tools. The system offers: 


e One-stop shopping - with comprehensive coverage that 
includes documents from research journals, conference 
proceedings and papers, book chapters, technical 
reports, government reports, annual igpots, standards, 
and monographs. 


e Helps researchers to get the information they need fast- 
processing orders for documents in the ISI collection 
within 24 hours of receipt, and providing delivery options 
that include fax (30-minute delivery upon request), 
courier, or traditional mail service. 


e Provides easy tracking of orders and accounts - via a 
professional customer support staff that helps to verify/ 
order status and confirm order pricing and payment 
balance; and 


e  Makerecord keeping easy by eliminating hidden costs— 
all materials have a standard processing fee, a variable 
copyright fee, and are copyright cleared. 


Internet and Internet Resources 475 


COURSE RESERVES/TEACHING RESOURCES/ 
TRAINING MATERIALS 


Many academic institutions have put their course 
reserves/teaching and training materials on the web and 
provide access to such materials. These resources are very 
useful as the base material for the student community as well 
as the faculty. Examples include: 


World Lecture Hall: Humanities (http:// 
wnt.cc.utexas.edu/~wlh/search/ results.cfm?count=|&from= 
browse&DescriptorID=43) : It is a site developed by 
University of Texas at Austin, which provides a set of 
annotated links to online courses, course descriptions, 
tutorials, assignments, tests and other materials used in the 
teaching of humanities and cultural studies. 


Course Reserves - UCSD Libraries (http:// 


libraries.ucsd.edu/services/ reserves.html) : As part of the 


Campus Course Materials Services of UCSD- Course 
materials are made available by several different services on 
campus. These services are working together to try to simplify 
access to course materials, while providing a wide variety of 
options to best serve students and faculty. Course reserves 
are categorized by library as well. For instance, links have 
been provided to the following course reserves by library: 


—  Artand Architecture Library 
(http:/ /aal.ucsd.edu/reserves/); 
— Film and Video Reserves 
(http://orpheus.ucsd.edu/ fvl/RESERVES.HTM); 
— Social Sciences and Humanities Library 
(http:// sshl.ucsd.edu/reserves/). 


DISCUSSION LISTS OR FORUMS/USENET 
NEWSGROUPS/MAILING LISTS/BULLETIN BOARDS 


Important of them are listed below. 


476 Manual of Digital Libraries 


HUMBUL-Humanities Bulletin Board (http:// 
users.ox.ac.uk/~humbul) : It is a Gateway site, maintained 
by Chris Stephens at Oxford University, which has the best 
Internet resources in the humanities, and also with a 
conference diary. Search or browse by clicking on 19 broad 
categories in the arts and humanities. A good starting point 
to identify quality web resources in this area. 


Humanist Discussion Group (Atip:// 
www. princeton.edu/"mccarty/humanist/ humanist html) : \t 
is exclusively for people working and interested in Humanities- 
related discussions and deliberations. 


Besides, there are also number of directories of 
newsgroups and mailing lists available, such as: 


Deja.com (http://www.deja.com) : This site searches 
a vast number of discussion forums and Usenet groups, 
including archives of previous postings. The new Deja.com 
now also aims to serve as an Internet consumer guide, with 
ratings of products and Deja Tracker informs you by e-mail 
about new postings in your favourite newsgroups. 


Neosoft (http://www.neosoft.com/internet/paml) : It is 
a huge directory of publicly accessible mailing lists and Usenet 
news groups, etc., with details about traffic and how to join. 


Mailbase (http://www. mailbase.ac.uk/lists.html) : This 
site gives access to over 2,000 electronic discussions lists 


for the UK higher education and research community. You 
can select by broad subject fields in the arts and humanities, 
sciences, health studies, or social sciences, and find out how 
to join any of these lists. 


RESEARCH RESOURCES/ RESEARCH PROJECTS 


Research resources/research projects provide a good 
starting point for researchers and academicians in any subject 
area to navigate what is broadly available and locate sources 


Internet and Internet Resources 477 


in their area of research. Such resources also provide links 
to tools that help in research work in the area of Humanities. 
Examples include: 


Voice of the Shuttle (http://vos.ucsb.edu) : It is possibly 
one of the most comprehensive source for humanities 
research on the net. This site has a large index to Internet 
resources, divided into headings, such as literature, culture, 
gender issues, minorities and religious studies, as well as 
more common humanities subjects. It is developed and 
maintained by English Department. University of California, 
Santa Barbara. 


Asia Resources on the World Wide Web (http:// 
www.aasianst.org/ asiawww.htm) : Hosted by the 
Association for Asian Studies, it has an extensive list of online 
resources on Asia in general, East Asia, Southeast Asia and 
South Asia, together with links to journals/newspapers, 
dictionaries, libraries, videos and art. 


Buddhist Studies WWW Virtual Library (http:// 
www.ciolek.com/ WWWVL-Buddhism.html) : Edited by T. 
Matthew Ciolek and Joe Bransford Wilson, this site keeps 
track of the major online sites and resources on Buddhism 
and Buddhist studies, including Web sites, databases, mailing 
lists, electronic newsletters and journals, and other networked 
resources. 


International Database of Digital Humanities Projects 
(http://www.ninch.org/programs/data/) or (http://ahds.ac.uk/ 
trends.htm #) : The US organization NINCH — National 
Initiative for a Networked Cultural Heritage, has recently 
helped set up an online international database with 
information on humanities computing projects. The database 
is a response to the call for peer-reviewed information on 
humanities computing projects that would focus as much on 
research, methodology and software as on “product.” That 
is, the database is not primarily a listing of all available 


478 Manual of Digital Libraries 


resources for the humanist — as is, for example, the Humbul 
Humanities Hub of the UK’s Resource Discovery Network, 
but rather as a tool for working scholars and funders to track 
the work done in a given area and to find reusable resources. 


The University of Michigan, Rice University Library and 
the University of Virginia have contributed personnel and 
resources; their professional library cataloguers ensure 
consistency and reliability of information. The database 
prototype, available from 2002, was seeded with data from 
the National Endowment for the Humanities and the Getty 
Grant Program. 


Online Reference Works (http://www.cs.cmu.edu/ 
references.html) : Dictionaries, thesauri, encyclopaedias, 
place-oriented references, etc., are available on this site. 


Reference Works (http://digital.net/~klane/ref.html) : 
This site provides good links and resources to freely 
accessible online reference works. 


Research-lt! (http://WWW.iTools.COM/research-it) : 
This gives access to dictionaries and acronym converters. 
Look up words in many language tools; and translate words 
to/from various languages; plus library, biographical, 
geographical and financial tools, and US and Canadian 
telephone numbers. 


Symbols.Com (http://www.symbols.com) : It is an 
online encyclopaedia of over 2,500 Western signs and 
symbols; downloadable, with graphic and word index. 


EVENTS AND CONFERENCES 


Such sites provide a great source for tracking events 


and conferences in the specified subject areas. Examples 
include: 


Australian Humanities Review - Conferences (http:// 
www.lib.latrobe.edu.au/ AHR/goodo/home.html) : Australian 
Humanities Review is a peer-reviewed interdisciplinary 


Internet and Internet Resources 479 


electronic journal founded by Cassandra Pybus. It is published 
quarterly with regular updates every two weeks. The journal 
is a good source of information for forthcoming conferences 
in the area of Humanities. 


CultureFinder (http://www.culturefinder.com/ 
index.htm) : This site locates arts events in the U.S. by place 


and date. 


ONLINE MULTIMEDIA PROJECTS AND EXHIBITS 


The multimedia projects and exhibits have special 
significance as a resource for humanities area. Multimedia 
basically means combination of audio, visuals, text, graphics, 
3 D visuals, etc. all on a single platform. Hence facilitating 
the virtual display of the product/resource and the like. 
Examples include: 


Alive TV (http://www.ktca.org/alive/season 12.html) : 
Alive TV presents the work of artists whose “work speaks to 


us through its unflinching portrayal of challenges overcome 
through sheer human spirit”. 


The Electronic Academic Village (http:// 
jefferson. village.virginia.edu/ home.html) : An intriguing 
mixture of texts, hypermedia projects, technical and research 
reports, and other projects which apply technology to the arts, 
humanities and social sciences. It is developed by the Institute 
for Advanced Technology in the Humanities, University of 
Virginia. 

Humanities-Interactive (http://www.humanities- 
interactive.org/) : Presented by Texas Humanities Resource 
Center, Texas Council for the Humanities, this site presents 
about fifty web exhibitions that cover the scope of human 
civilization and culture in graphical presentations. Other 
current exhibits include: “Border Studies”; “Bonfire of Liberties: 
Censorship of the Humanities”; “Newscast from the Past: June 
15, 1215” MPEG movie clips. 


480 Manual of Digital Libraries 


UCSC Humanities Division (http:// 
humwww.ucsc.edu): It provides links to internet projects on 
Dante’s Divine Comedy; Dickens; CineMedia; Satyajit Ray; 
California Writing Project; The Virtual Mexico Project, etc. 


Library of Congress Online Exhibits (http:// 
Icweb.loc.Rov/exhibits/) : The site provides an invaluable 
resource pertaining to art exhibits and provides access to 
Exhibitions currently on display; Treasure-Talks; Exhibitions 
currently on tour and much more resources. This is a good 
site for artists and art lovers. 


Culture Kiosk (http://www.culturekiosque.com/ ) : It is 
a European Arts and Culture site. 


China the Beautiful (http://www.chinapage.com/ 
china.html) : This site includes classical Chinese art, 
calligraphy, history, literature, painting, poetry and philosophy. 


Indiana University-Purdue University Indianapolis 
(http:// www.ulib.iupui.edu/special/) : This site provides links 
to special collections and archives like Collections and 
Exhibits; Online Exhibits; University Archives ; Manuscript 
Collections; Philanthropy Collections; German-American 
Collections, besides General Collections. 


CAMPUS WIDE INFORMATION SYSTEMS (CWIS)/ 
UNIVERSITY DEPARTMENTS 


As part of the Campus Wide Information Systems, a 
number of University Departments provide an excellent forum 


for accessing Humanities Resource collection. Examples 
include: 


University of Massachusetts - English Department 
Links (http:// www.umass.edu/english/links.html) : This is 
an excellent source of all types of English literature resources 
available on the WWW. The site provides links to the following 
resources: 


Internet and Internet Resources 481 


English Departments on the Web 
(http://www.umass.edu/english/links.htmI#HENGDPTS) 
Courses and Course Material 
(http://www.umass.edu/english/links.html#COURSE) 
Professional Resources 
(http://www.umass.edu/english/links.html#PRO) 
Guides to Literature and Humanities Resources 
(http://www.umass.edu/cnglish/links.html#Lit) 

Online Journals and Other E-Texts 
(http://www.umass.edu/english/links.html#JOURN) 
Publishers of Educational and Academic Software and 
Books 
(http://www.umass.edu/english/links.html#PUB) 
Online Writing Labs (OWLS) and Tutorial Sites 
(http://www.umass.edu/english/links.html#fOWLS) 
MOOs and Interactive Resources 
(http://www.umass.edu/english/links.html#MOOS) 
Information on the Web, OIT, and the Internet 
(http://www.umass.edu/english/netaid.html) 

Special Interest Resources 
(http://www.umass.edu/english/links.html#SIR) 
Division of the Humanities, University of Chicago 


(http:// humanities.uchicago.edu/humanities/) : The site 
provides a lot of resources in the Humanities; Humanities 
Journals; Humanities Organizations; Centers for Area 


Studies; and Humanities-Related Online Resources. 


UCSB Library Humanities Collections (http:// 
www.library.ucsb.edu/subj/ humanit.html) : UCSB 
humanities librarians and faculty are working together to 
develop long-term library needs for the humanities, including 
services, space, equipment, etc. The documents here give 


482 Manual of Digital Libraries 


an overview of how the group began as well as some of the 
pressing needs expressed by faculty and librarians. Provision 
is made here to facilitate the navigator to receive has been 
notification via e-mail about updates in his/her area of interest. 


It is a good site for humanities resources covering Art 
& Architecture; Classics; Dance ; Dramatic Art ; English ; Film 
& Television ; French ; German ; History; Italian; Linguistics; 
Music; Philosophy; Religion; Spanish & Portuguese ; and U.S. 
History. There are a large number of other resources available 
at this site in the area of Humanities, 


U.C. Berkeley and Internet Resources- Humanities 
and Area Studies (http:/ /www.lib.berkeley.edu/Collections/ 
acadtarg.html) : It is a rich resource-base on African American 
Collections which are concentrated chiefly in the humanities 
and social sciences The African American Collections are 
very strong in microfilm editions of important research 
materials, including personal papers of prominent African 
Americans and other historical records, such as the Papers 
of the NAACP. This page lists selected Internet resources 
for African American studies. Besides including resources at 
U.C. Berkeley, it also provides links to Resources at other 
Institutions and On the Internet; Organizations and 
Associations; Electronic Journals. 


IUPUI University (Indian University, Purdue University 
Indianapolis University Library) (http://www.ulib.iupui.edu): 
IUCAT Search For Library Catalogue IUCAT contains records 
for more than 5 million items held by the Indiana University 
Libraries statewide. It also provide access to other Library 
Catalogues; Library databases and Electronic Journals; 
General Reference Sources; ERROL: Electronic Course 
Reserves; Subject Resource Guides; E-mail and Application 
Software; Video and Television Resources; Joseph and 
Matthew Payton; Philanthropic Studies Library; and Special 
Collections and Archives. 


/nternet and Internet Resources 483 


Besides, University of California, Berkeley, College of 
Letters & Science, Humanities Division is avaliable at http:// 
Is.berkeley.edu/divisions/hum/. 

UCLA Humanities (http://128.97.154.196/) : This is the 
Website of the University of California. Los Angeles, 
Humanities Departments, inculding electronic journals and 
online computing resources. 


UCSC Humanities Division (http:// 
humwww.ucsc.edu): This is the Website of the University of 
California, Santa Cruz, Humanities Departments, including 
humanities projects and events. 


DISSERTATIONS 


Dissertations also form a very important resource for 
the Humanities research. There are large number of sites 
available on the Internet to help the researchers in not only 
locating dissertations and descriptive abstracts of the 
dissertations available in the digital format on the Internet, 
but also in locating sites that can facilitate researchers in 
quality thesis and dissertation research, writing, editing 
(www.Bear-Write.com), (www.360-thesis-writing.com); 
locating Stats Advisors (www.dissertationadvisors.com) and 
Dissertation Experts (www. TheDissertationExperts.com). 
Examples include: 

UMIs ProQuest Digital Dissertations (http:// 
www.lib.umi.com/dissertations/ gateway) : The best and 
perhaps the largest dissertation source available is UMIs 
ProQuest Digital Dissertations (PQDD). With more than 1.6 
million entries, the Dissertation Abstracts database is the 
single, authoritative source for information about doctoral 
dissertations and master’s theses. The database includes 
bibliographic citations for materials ranging from the first U.S. 
dissertation, accepted in 1861, to those accepted as recently 
as the previous last semester. Citations for dissertations 
published from 1980 forward also include 350-word abstracts 


484 Manual of Digital Libraries 


written by the author. Citations for master’s theses from 1988 
forward include 150-word abstracts. The full text of more than 
one million of these titles is available in paper and microform 
formats. Institutional subscribers to ProQuest Digital 
Dissertations receive on-line access to the complete file of 
dissertations in digital format starting with titles published from 
1997 onwards. 


SOFTWARE ARCHIVES 


Text Analysis Info Page (http://www.textanalysis.info) 
is a website that provides information on text analysis and 


especially software for the analysis of human communication 
content. This is mostly text, but not limited and quite a few 
programs can handle audio and/or video data. Starding with 
the first version in April 1999, this site has become quite 
popular site now. There is a whole lot of information and 
several slinks to various resources available at this site 
ranging from Conferences, Workshops & Forums; Mailing 
Lists; News to Text Archives; Books and Regressive Imagery 
Dictionary and of course to Softwares (Classification of Text; 
Analysis Software; Definitions and Terms; Transcribing 
Software — audio/video); Language — Linguistics Information 
Retrieval and much more. 


REPORTS 


Research Reports are available on (http:// 
jefferson.village. virginia.edu/researchProjects.html). Institute 
for Advanced Technology in the Humanities (IATH) site covers 
theoretical, speculative, and documentary essays written by 
IATH fellows and staff about IATH projects and the issues 
they raise, or about electronic scholarship and culture more 
generally. Unless otherwise noted, items published by the 
Institute for Advanced Technology in the Humanities are 

copyrighted by the authors and may be shared in accordance 
with the Fair Use provisions of U.S. copyright law. 


Internet and Internet Resources 485 


Redistribution or republication on other terms, in any medium, 
requires express written consent from the author(s) and 
advance notification of the publisher. The Institute for 
Advanced Technology in the Humanities derives its specific 
software projects from the needs of its fellows and their 
projects— in general, and their goal is to produce software 
that will be broadly useful in the humanities computing world, 
that runs over the internet in conjunction with the Web, and 
that addresses problems not likely to be solved by those who 
develop software for business and entertainment. 


DATA CENTRES 


The Data Center at Alexander Library is a unique 
resource for Rutgers faculty, students, and other scholars. 
The Humanities and Social Sciences Data Center is availavble 
at http:// www.sec.rutgers.edu/datacenter/humanities/ 
about.htm, which facilitates access to the Rutgers University 
Libraries’ Government, ICPSR (Inter-University Consortium 
for Political and Social Research) and networked CDROM 
data, while also serving as a clearinghouse for existing or 
newly created data collections located elsewhere. 


The Humanities component of the Data Center is 
developed jointly by the SCC and CETH, the Center for 
Electronic Texts in the Humanities. The Data Centers holdings 
provides access to a number of full-text databases, such as 
the Packard Humanities Institute, the Thesaurus Linguae 
Graecae, the Oxford English Dictionary, CETEDOC, and 
Letteratura Italiana Zanichelli. It also offers a convenient point 
of access to full-text Web-based databases, for which it 
provides supplementary documentation and tutorials. Some 
of these databases are the African American Poetry 
Database, the English Poetry Database, ARTFL, and the 
Dartmouth Dante Project. 


486 Manual of Digital Libraries 
WORKING PAPERS 


CH Working Papers or Computing in the Humanities 
Working Papers are an interdisciplinary series of referred 
publications on computer-assisted research, which are 
available on http://www.chass.utoronto.ca/epc/chwp/. They 
are a vehicle for an intermediary stage at which questions of 
computer methodology in relation to the corpus at hand are 
of interest to the scholar before the computer disappears into 
the background. CHWP includes the publications grouped 
by the service under following— articles appearing for the first 
time; postprints, articles that were originally published in print 
form; preprints, articles that have been accepted for 
publication by print journals and that will either be withdrawn 
when published in print or become postprints; essays on the 
epistemology and sociology of computer-assisted research 
relevant to computing in the humanities; non-referred 
experimental papers that exploit those properties of the 
electronic medium that are significantly different from the 
properties of print; mutanda, moderated but not referred. Each 
article is accompanied by an abstract in both English and 
French. 


REFERENCE SOURCES 


Reference sources have been categorized in two 
groups as listed below based on the nature of their coverage. 


e Specific Reference Resources 
° General Reference Resources 


Specific Reference Resources on Humanities (http:// 


www. library.siue.edu/lib/info/refhem.html) are depicted in 
Table 5.2. 


Internet and Internet Resources 487 


Table 5.2. Specific Reference Sources 


Database/ Subjects Contents Coverage Dates/ 

Vendor Updates 

A BELL Monographs, Citations Annual Bibliography 1920- 

(1920-2001) literary works, of English Language 2001 

Chadwyck- book reviews, and Literature,and Annual 

Healey essays, doctoral full text of 71 related 
dissertations, and journals. 


journal articles on 
English language 
and literature 


African- Poetry Text 3,000+ poems written 1750- 
American by African-American 1900 
Poetry poets 

(1750- 

1900) 

Chadwyck 

-Healey 

American Poetry Text 4,000+ poems written 1600- 
Poetry by 200+ American 1900 
(1750- poets 

1900) 

Chadwyck 

-Healey 

Canadian Poetry Text 12,000+ poems 1600- 
Poetry 142 Canadian ca. 
Chadwyck poets 1930 
-Healey 

English Poetry Text 165,000+ poems 600- 
Poetry written b 1,250+ 1900 
Chadwyck English poets 

-Healey 

English English poetry Text 183,000+ poems by 8th to 
Poetry 2,700+ English 20th 
(2nd Ed.) poets centuries 
Chadwyck 

-Healey 


Essayand Humanities and Citations 3500+ English 1985+ 


488 


General 
Literature 
Index 
Silver 
Platter 


Faber 
poetry 
Library 
Chadwyck 
-Healey 


OED 
Online 
Oxford 
University 
Press 


Twentieth 
Century 
African- 
American 
Poetry 
Chadwyck 
-Healey 


Twentieth 
Century 
American 
Poetry 
Chadwyck 
-Healey 


Twentieth 
Century 
English 
Poetry 
Chadwyck 
-Healey 


social sciences, 
economics, 
political science, 
history, philosophy 
religion, literary 
criticism, drama 


and film 

Poetry Text 
The ultimate Text 
authority on English 
language words 


and quotations. 


Poetry Text 
Poetry Text 
Poetry Text 


Manual of Digital Libraries 


language essay Annually 
collections and 
anthologies 


31 poets, 91 volumes 1925+ 
published by Faber 
and Faber 


Complete content of 1997 
the 23-volume Oxford Quar- 
English Dictionary, terly 
Second Edition 


10,000+ poems 1900- 
written by 70 2000 
influential and = 
important African- 

American poets. 


52,000+ poems 1900- 
written by 104 2000 
poets. — 

598 volumes by 1900- 
283 poets. 2000 


_General Reference Resources include the following. 


Web 


of On-line 


Dictionaries 


(http:// 


www. facstaff.bucknell.edu/rbeard/ diction.html) : It links to 


Internet and Internet Resources 489 


on-line dictionaries in languages from Aklon to Zulu, as well 
` as multilingual dictionaries and other language tools. 


Common Errors in English (http://www.wsu.edu/ 
~brians/errors/errors.html) : This is an alphabetical list of 
common errors as well as common “non-errors” and a section 
that links with other pages and sites that deal with use and 
misuse of language. 


Commonly Confused Words (http://www.pnl.gov/ag/ 
usage/confuse.html) : It is a scrolling alphabetical list with 
explanations. 


World Wide Words (http://www.clever.net/quinion/ 
words/) : It is a site dedicated to words and their meanings. 


Webster Guide to Grammar and Writing (http:// 
webster.commnet.edu/HP/ pages/darling/grammar.htm) : It 
is extensive and organized indexed grammar reference site 
with an essay guide. 

The Internet for Writers by Charles Deemer (http:// 
www.teleport.com ~cdeemer/syl5-home.html) : It gives 
explanation of the types of resources (eg. e-mail, chat, 
listservs, the web) available to writers, instructions as how to 
use them, and some common pitfalls. 


Dictionary.com (http://www.dictionary.com) : 
Dictionary.com is produced by Lexico Publishing Group, LLC 
(http:// www.lexico.com/, a leading provider of language 
reference products and services on the Internet. This site 
provides links to various dictionaries, thereby facilitating 
search to various dictionaries at one place. In order to use 
the dictionary, you need to simply type a word in the blue 
search box that appears at the top of every page and then 
click the option ‘Look it up’ button. This will perform a search 
for the word in the several dictionaries hosted on this site the 
list of all dictionaries covered herein appears in the homepage 
of this site. You do not know how to spell the word, just guess, 


490 Manual of Digital Libraries 
and you will get a list of suggestions if you are wrong. 


Thesaurus.com (http://www.thesaurus.com) : 
Thesaurus.com is produced by Lexico Publishing Group, LLC 


(Attp:// www.lexico.com/, a leading provider of language 
reference products and services on the Internet. In order to 
use the thesaurus, simply type a word in the gold search box 
that yopu will find on the webpage when you access this site 
and click the option ‘Look it up’ button. A list of synonyms 
and antonyms will be returned. 


The thesaurus that appears on this site is Roget's 
Interactive Thesaurus. 


Encyclopedia.com (http://www.encyclopedia.com) : 
Encyclopedia.com is the Internet’s premiere free 


encyclopedia that provides users with more than 60,000 
frequently updated articles from the Columbia Encyclopedia. 
Each article is enhanced with links to newspaper and 
magazine articles as well as pictures and maps - all provided 
by eLibrary. This e-Library is a comprehensive digital archive: 
for information seekers of all ages where one can search 
across 13 million documents from full-text newspaper and 
magazine articles, television and radio transcripts, 
international newswires, classic books, maps, photographs, 
as well as major works of literature, art and reference books. 
One can find both current and historical events within the 
diverse eLibrary archive. 


DIRECTORIES 


Spencer Maybee Writing Resources in Humanities - 
DIRECTORIES available on http://www.finearts.uvic.ca/ 
~smaybee/writlinks.html#directories, provides link to several 
resources such as Humanities Sites; Link Pages; On-line 
Labs; On-line Resources, etc including directories listed 
below: 


Internet and Internet Resources 491 


— Yahoo Directory: (http://dir.yahoo.com/Arts/Humanities/ 


Literature/). 


— Open Directory Project: (http://www.dmoz.org/Arts/ 
Humanities/). 


Both directories are an excellent starting place for on- 
line research into almost any subject. It is searchable, and 
organized by levels of specificity including links at all nodes. 
The number of hits relating to a topic appears in parentheses 
next to the topic name, but if you click on the topic you will 
find not only the links to the topic but several subtopics and 
their number of links in parentheses. 


Cape Cod Community College (CCCC) available on 


http://www.capecod.mass.edu/li-brarv/websites.htm, gives 
links to large number of resources in various subject areas 


including Humanities. It lists various directories in this area. 
The site lists Websites recommended by CCCC Librarians. 
(B) By Category/Resource Types 


Arts and Humanities resources can be categorized into 
four categories based on the types of resources as under. 


e Subject Gateways 

e Resource Guides 

e Rating System 

e Search Tools and Search Tips 


SUBJECT GATEWAYS 


Subject gateways are indeed very comprehensive 
source for identifying subject specific resources, and 
providing pointers to Internet resources in a given subject 
area or areas. In the area of Humanities the following are the 
best subject gateways that need to be consulted by 
academicians or researchers or anyone who is looking for 


492 Manual of Digital Libraries 


quality and specific resources in Humanities. Examples 
include: 


Humbul (http://www.humbul.ac.uk/) : Itis a best source 
in the Humanities, which is Covering a large collection of high 
quality links to scholarly resources in the humanities. The 
Humbul Humanities Hub is a service of the Resource 
Discovery Network funded by the Joint Information Systems 
Committee and the Arts and Humanities Research Board, 
and is hosted by the University of Oxford. The Humbul 
Humanities Hub aims to be UK’s higher and further 
education's first choice for accessing online humanities 
resources. It provides pointers to resources on a wide 
dimension of areas in humanities such as - English; History; 
Philosophy; History and Philosophy of Science; Theology; 
Religion; Modern Languages etc. 


It also facilities searching by resource types — primary, 
secondary or bibliographic. Besides facilitating search by 
Resource Type, it also facilitates search by Period; Intended 
Audience; and All Records. 


A new service that has also been launched is My 
Humbul Include that brings Humbul’s resources direct to a 
researcher or lecturer's web page. Through the use of 
dynamic html, users can embed a set of selected records 
within their own web page with the minimum of effort. Custom 
annotations can be added to supplement or replace Humbul’s 
own descriptions. Meanwhile Humbul maintains the metadata 
for each record - the link to the resource, the title, Humbul’s 
description, the user’s custom description and more - on 
behalf of My Humbul users. 


Besides includes details about the annual Digital 
Resources for the Humanities (DRH) conference which is the 
major forum for all those involved in; and affected by, the 
digitization of our cultural heritage. The conference brings 
together scholars, teachers, publishers and broadcasters, 


Internet and Internet Resources 5 493 


librarians, curators and archivists, and computer and 
information specialists, providing an opportunity to consider 
the latest ideas in the creation and use of digital resources in 
all aspects of work in the humanities: It provides quick links 
to resources for Humanities Research pertaining to as shown 


in Table 5.3. 


Table 5.3. Humbul Links to various Resources 


Bibliographic & Journals 


COPAC 

British Library Public Catalogue 
ISI Arts & Humanities Index 
ZETOC - tables of contents 

BL Inside Service 

OCLC FirstSearch 

Periodicals Contents Fulltext 
MLA International Bibliography 
ASLIB Index to Theses 


Journals via NESLI 
Journals via Ingenta 


Journals via JSTOR 


Art, Images and Film 


Mapping and Census Data 


Digimap - OS map data 
Bartholomew Digital Map Data 
UKBORDERS - UK boundaries 
Great Britain Historical GIS 
SPOT Satellite Images 

The Data Archive 

1981 Census Datasets 

1991 Census Datasets 


GENUKI Genealogy Information 
Server 


Taxatio Database 


Ask Giraffe - geospatial gateway 


Subject-Specific/Reference 


e 


Performing Arts Data Service 


Visual Arts Data Service 
SCRAN 

The Grove Dictionary of Art 
Art Abstracts 


Helix Project 


Arts and Humanities Research 
Board 


The Oxford Text Archive 

The History Data Service 

The Archaeology Data Service 
The Archives H 


National Register of Archives 


494 Manual of Digital Libraries 


BUFVC Television Index Public Record Officer 
The AVANCE Database BLPES Pamphlet Collection 
AXIS Database Literature Online 


Early English Books Online 
Perseus Project 

Oxford English Dictionary Online 
Jiscmail- email forums 

UK Mirror Service (downloads) 


CHEST Directory 


The other facilities are pointers to other related services 
such as Resource Discovery Network; Learning and Teaching 
Support Network; and Arts and Humanities Data Service. 


BUBL LINK / 5:15 - Humanities Links (http:// 
bubl.ac.Uk/link/h/ humanitieslinks.htm) : This is a Mega Site 
provides links to different hubs or portals in the area of 
Humanities. 


These hubs or portals are: 


Arts and Letters Daily (http://www.cybereditions. con/ 
aldaily/); Edsitement (http://edsitement.neh. fed.us/); 
EncycloZine (http://encyclozine.com/); HUMBUL Gateway 
(Attp:/users.ox.ac.uk/~humbul/)); Topica: Humanities (http.7 
/www.topica.com/dir/?cid=204); Voice of the Shuttle: Web 
Page for Humanities Research (http:/ vos.ucsb.edu/); World 
Lecture Hall: Humanities (http://wnt.cc.utexas.edu/_wlh/ 
search/ results.cim?count=1 &from=browse&Descriptor/D= 

43), WWW Virtual Library: Humanities (http://www.hum.gu.se/ 
W3VI/). 


Resources in the Humanities (http:// 


www. library.ucsb.edu/subj/humanit.html!) : This is also a 
mega site covering whole range of sources in the area of 


Internet and Internet Resources 495 


Humanities. The site provides link to resources such as - 
UCSB Library Humanities Collections; Humanities Centers; 
Megasites (e.g. Art Access; ArtsWire; CultureNet; Galaxy 
Guide to Humanities; Yahoo! Arts ; Yahoo! Society and 
Culture); Popular Culture; Online Multimedia Projects & 
Exhibits , Publishers, Booksellers & Library Catalogs; Online 
Text, Journals & News Sources; Events & Conferences, and 
much more. 


The English Server (eServer) (http://english- 
server.hss.cmu.edu/) : It is maintained by University of 
Washington, holds over 18,000 Humanities Texts Online. 


About.com (formerly the Mining Co.) (http:// 
www.about.com) : It is an Internet directory compiled by ‘real 
people’, to help you sift through all the sites you do not have 
the time, to look at. It also links to hundreds of guides, including 
many on books, writing, visual arts, and the media. You may 
browse or search by interest areas/topics. 


Academic Info (http://www.academicinfo.net) : Itis an 
extensive subject directory of Internet resources for the 


university community. 


Interesting www Sites about Science - Science: Social 


Sciences, Humanities and Arts (http://www.ac.by/science/ 
human.html) : The site covers Internet resources in various 
disciplines including Humanities. In the area of Humanities, 
it provides links to all the major Humanities subject gateways, 
University Department resources; Organizations archives; 
guides and much more. 


RESOURCE GUIDES 


Some of the important resource guides are listed below. 


Arts & Humanities Collection Michigan Arts & 


Humanities (http://mel.lib.mi.us/humanities/HUM- 
general.html) and (http://mel.org/ humanities/HUM- 


496 Manual of Digitai Libraries 


general.html) : This service is funded in part by the State of 
Michigan through the Library of Michigan. Additional project 
support comes from the federal Library Services and 
Technology Act (LSTA)via the Institute of Museum and Library 
Services (IMLS). The service provides pointers to all major 
resources in this area thereby acting as one of the major 
Resource Guides in the field of Arts and Humanities. Broadly, 
the site includes links to General Resources in the Arts & 
Humanities; E-Zines; portals, several commercial databases 
and other related resources. 


Encyberpedia (http://www.encyberpedia.com/ 
edindex.htm) : It is a directory of subject-orientated links, 
arranged by broad subject groups, but with a more detailed 
index. 


Metaplus (http://www.metaplus.com) : It is a Web 
directory with extensive links to sites on books/literature, 
publishing, media, magazines, libraries, and writers’ 
resources. 


H-NET: Humanities OnLine Homepage (hiip://h- 
net.msu.edu/) : H-Net is an international interdisciplinary 
organization of scholars and teachers dedicated to developing 
the enormous educational potential of the Internet and the 
World Wide Web. Its edited lists and web sites publish peer 
reviewed essays, multimedia materials, and discussion for 
colleagues and the interested public. The computing heart of 
H-Net resides at MATRIX— the Center for Humane Arts, 
Letters, and Social Sciences Online, Michigan State 
University, but H-Net officers, editors and subscribers come 
from all over the globe. H-Net’s hundreds of volunteer editors 
foster on-line communities in the humanities and social 
sciences by monitoring email-based discussion lists and 
associated web sites. H-Net was one of the first to join the 
organizational signatories of the Budapest Open Access 
Initiative (BOAI) which went public on February 14, 2002. The 


Internet and Internet Resources 497 


BOAI states that “The literature that should be freely 
accessible online is that which scholars give to the world 
without expectation of payment.” 


Humanities Resources on the Internet (http:// 
www.sil.org/~radneyr/humanities/resources.html) : This site 
is a mega resource site providing link to a number of quality 
resource indexes; University Departments — having rich 
Humanities collection on the Internet; professional societies; 
journals and discussion groups; and Humanities computing 
resources. 


GALILEO Internet Resources - Arts and Humanities 


(http://www.usg.edu/galileo/internet/arts/artsmenu.htm) : It 
lists wide ranging resources in different areas of Arts and 


Humanities. The resources range from Directories; Periodical 
Indexes; Periodicals; Databases; Images; Women Architects; 
Academic Departments; Organizations & Societies; Finns; 
Jobs; Conferences and much more. 


RATING SYSTEM 


Argus Clearinghouse Rating System available at http:/ 


/www.clearinghouse.net/ratings.html is a rating system. The 
snapshot of the home page of the site gives a clear picture 


about what such sites are and how such rating systems 
function. 


SEARCH ENGINES 


There are available various search engines devoted to 
humarities area — important are given, below. 


ProFusion (http://www.profusion.com) : It is a 
metasearch engine, which is very good for advanced 
searches. 


Google -Humanities (http://directory.google.com/Top/ 
Arts/Humanities/) : It is a superior and innovative search 


498 Manual of Digital Libraries 


engine that uses the number of links to a site to rate its 
importance. And ‘I’m feeling lucky’ button automatically takes 
you to the first Web page - i.e., the best result/match - returned 
for a query. Click on the bar graph at the beginning of the 
result to see which pages link to the particular page. Definitely 
it is one of the coolest search engines that is strongly 
recommended. The documents in the following categories in 
this subject area are included. 


Anthropology; Art History; Classical Studies; 
Dictionaries; Great Books Indices; History; Languages; 
Literature; Literature in Art; Mailing Lists; Medieval Studies; 
and Philosophy. 


` Hotbot (http://www.hotbot.com) : It is an easy to use, 
fast and powerful, and one of the most consistently top-rated 
search engines. It offers a single form —in a pull-down menu, 
to search for a phrase, person, or any combination of search 
terms to fine-tune your output preferences. Good for ‘exact 
phrase’ searches, and also lets you search for sites with audio 
or video features. 


Magellan (http://www.magellan.excite.com) : It is an 
another good online guide to the Web, together with reviews 
by broad subject areas; with ‘Search Voyeur’, and an intriguing 
real-time site that allows you to ‘spy’ on searchers. 


Yahoo-Humanities (http://dir.yahoo.com/Arts/ 
Humanities/) : The following are the Categories in this subject 
area for which pointers are provided by this site. 


Bibliographies; Chats and Forums; Classics; Critical 
Theory; Cultural Studies; Education; Events; History; 
Institutes; Journals; Linguistics and Human Languages; 
Literature; Medical Humanities; Organizations; Philosophy; 
Theology; Web Directories. 


Besides the site also points to Site Listings. 


Internet and Internet Resources 499 


E-mail Search : For tracking down e-mail addresses 
and locating experts and others, the following search engines 
may be more suitable for the job: 


Bigfoot Directory Search (http://www. bigfoot.com) : It 
tracks down e-mail addresses. Not infallible, but on the whole 
works quite well. 


Mesa (http://mesa.rrzn.uni-hannover.de) : It is an e- 
mail search agent that claims to be ‘the largest e-mail address 


book worldwide’. Good for continental European e-mail 
addresses. 


Yahoo People Search (http://people.yahoo.com) : It 
also tracks down e-mail addresses. 


Search Tools and Search Tips 


Below are listed some sites that help to find search tools 
and tips. 

Finding Information on the Internet: A Tutorial (http:// 
www.lib.berkeley.edu/TeachingLib/Guides/Internet/ 
FindInfo.html) : This site is from the Library at the University 
of California at Berkeley. This is an excellent and easy-to- 
use tutorial on the latest and best search tools, and the best 
search strategies - progressing from simple searches to 
advanced searching - together with a general introduction to 
the Internet and the World Wide Web. In addition to 
recommending search strategies that are adaptive to specific 
topics, it provides detailed search instructions to five major 
search engines namely, Google, Altavista Advanced Search, 
Infoseek, Northern Light, and FAST Search. 


The Spider’s Apprentice (http://www.monash.com/ 
spidap.html) : This is a useful guide for beginners. How to 
get the most from search engines, tips for search strategies, 
analysis of search engines, frequently asked questions, and 
more is available on this. 


=; laine 


500 Manual of Digital Libraries 


ZD-Net/PC Magazine Online: Your Complete Guide 
to Searching the Net (http://www.zdnet.com/pcmag/ 
features/websearch/intro.htm) : It reviews and provides 
assessments of search engines, indexes and directories. 


(C) By Organization 
On the basis of organization, the e-resources can be 
divided into following types. 


e Humanities Centers and Organizations 
e Text Centers 

e Professional Societies 

e Networking Organizations 

e Research Associations 


e University Departments 


HUMANITIES CENTERS AND ORGANIZATIONS 


These centers and organizations are the mission 
oriented institutions. As such besides making endeavours to 
fulfill the specified missions in the said subject area, they also 
provide wide range of resources in the subject and/or pointers 
to resources in the area. 


UNESCO - Social and Human Sciences 


Documentation Centre (http:// www.unesco.org/unesdi/) or 
(htip://www.unesco.org/general/eng/infoserv/doc/ shsdc/) : 


All UNESCO information sources are available from this 
unique access point. Bibliographic & referral directories, 
projects, etc. and full text databases produced by UNESCO 
are in its domains of competence, education, natural sciences, 
culture, social and human sciences, communication and 
information. Information Services located at Headquarters 
and in the Field Offices and Access, to the full texts of official 
UNESCO documents the photobank, the worldwide 


Internet and Internet Resources 501 


translations database, thesauri. Some links to directories for 
specific themes — worldwide portals for libraries, archives, 
poetry, oceanography, etc. are also provided. The information 
sources are Classified by type and by theme reflecting the 
main areas of activity of UNESCO. 


Americans for the Arts (http://www.artsusa.org/) : 
Americans for the Arts is a national support group for issues 
of public policy relating to arts and culture. This site features 
information and news on arts funding, education, advocacy, 
arts awareness resources, and links to other arts internet 
resources. 


The Center for the Book in the Library of Congress 
(http://Icweb.loc.gov/ loc/cfbook/) : The Center for the Book 
was established by law in 1977 to stimulate public interest in 
books, reading, libraries and literacy. The website contains 
information on its State Centers Affiliates; Community of the 
Book Organizations; Book History/Book Art Programs; 
Reading Promotions; Book Fairs, Literary Festivals, other 
Book Events; and Book Lists. 


The National Endowment for the Humanities (http:// 
www.neh.gov/) : The NEH is a federal agency that supports 
learning in history, literature, philosophy, and other areas of 
the humanities. It funds research, education, museum 
exhibitions, documentaries, preservation, and activities in the 
states. 


National Humanities Institute (NHI) (http:// 
www.nhumanities.org/) : NHI promotes-research, publishing, 
and teaching in the humanities. The NHI website includes an 
online journal, NHI publications and research listings, an 
electronic bulletin board, and links to other humanities 
research on the net. 


TEXT CENTERS 


Text Centers provide the primary means of access to, 


502 Manual of Digital Libraries 


as well as information on, full-text scholarly resources 
available at the various institutions. Texts are generally 
arranged by language, subject, and searching interface. Many 
such facilities do not point to digital facsimiles of texts; rather, 
they direct patrons to “searchable” encoded texts and texts 
where one finds requisite information. However some provide 
pointers to the text itself. The Electronic Text Center supports 
the research and instructional use of electronic texts, mainly 
primary sources in the humanities and social sciences, from 
a dedicated facility. At many such sites the assistance for 
how to use and create e-texts by providing consultation tips, 
instructions and point-of-use guides are also included. 


Electronic Text Center- University of Virginia Library 


(http:// etext.lib.virginia.edu/) : The home page of this text 
center is self explanatory and highlights what such sites offer. 


Center for Electronic Texts in the Humanities (http:// 
www.ceth.rutgers.edu/) : The Center for Electronic Texts in 
the Humanities is dedicated to helping people access and 
implement research projects using electronic texts. This 
Project is a joint effort at Princeton and Rutgers Universities. 
The site gives detailed information about items indicated in 
Table 5. 4 below by clicking on the links in online mode. 


Table : Items in CETH - Humanities 


General Information Introductory Material 
e Contact e XML Resources 
e Hours e SGML Resources 
- Staff e Cold Fusion Resources 
e TAG e HTML Resources 
e Conferences e FAQs about Etexts 
° Search Engine Basics e Overview of Computing 


° Latin Texts XML e Guidelines for Evaluation 


Internet and Internet Resources 503 


° Projects e Information Services 

s Projects e Directory of Etext Centers 
G Workshops e Computing Resources 

: ASNT Conference 2002 e Data Center 


CETH Directory of Electronic Text Centers (http:// 


www.ceth.rutgers.edu/ information Services/ectrdir.html) : 
This Directory is available in an html version on a web server 
at Rutgers University’s Scholarly Communications Center at 
Attp:/harvest.rutgers.edu/ ceth/etextdirectory/volume.html/ 
and provides links to number of electronic text centres. 


Other important text centers that facilitate access to 
specific resources include: 


° Oxford University: Humanities Computing Unitat http:/ 
/info.ox.ac.uk/oucs/ humanities/index.html. 


o Humanities Text Initiative at http://www.hti.umich.edul. 

° Computers & Texts at http://info.ox.ac.uk/ctitext/ 
publish/comtxt/. 

° CTI Centre for Textual Studies at http://info.ox.ac.uk/ 
ctitext/index.htm. 

e Centre for Computing in the Humanities, King’s College 
London at http:// www.kcl.ac.uk/humanities/cch/. 

e Centre for Computing in the Humanities (UToronto) at 
http:// www.chass.utoronto.ca:8080/cch/ . 


e CHORUS, Academic and Educational Computing in the 
Arts/Humanities at http://www-writing.berkeley.edu/ 
chorus/. 


PROFESSIONAL SOCIETIES 


Professional Societies are the general body of persons 
relating to a profession, generally engaged in an activity as a 


504 Manual of Digital Libraries 


means of livelihood. These societies are set up with some 
societal missions with regard to the profession and hence 
they always work towards fulfilling the objectives for which 
these exist. Besides such bodies also make constant efforts 
to design and develop new mechanisms of looking at different 
things pertaining to the profession addressing core issues 
for the overall development of the field; finding solutions to 
the problems; networking of various related organizations; 
providing a platform for personal interaction with the coterie 
and the like. Therefore, resources which such organizations 
put up or provide access to are very valuable and 
contemporary nature. 


Humanities and Social Science Federation of Canada 
(http://www.hssfc.ca/) : It is an association of scholarly 
societies in Canada. It has been created by an amalgamation 
of the former Canadian Federation for the Humanities (CFH) 
and the Social Science Federation of Canada (SSFC), the 
Federation came into being on April 1, 1996. The Federation 
currently represents 67 learned societies, 69 universities and 
colleges and over 24,000 scholars and graduates active in 
the study of languages, sociology, literatures, religion, 
geography, psychology, anthropology, history, philosophy, 
classics, law, economics, education, as well as linguistics, 
women’s issues, industrial relations and international 
development. 


National Endowment for the Arts (http:// 
arts.endow.gov/ ) : The National Endowment for the Arts 
provides national recognition and support to significant 
projects of artistic excellence, thus preserving and enhancing 
USAs diverse cultural heritage. The Endowment was created 
by Congress and established in 1965. It is an independent 
agency of the federal government. This public investment in 
the nation’s cultural life has resulted in both new and classic 
works of art reaching every corner of America. 


Internet and Internet Resources 505 


National Endowment for the Humanities (NEH) (http:/ 
/www.neh.fed.us/ whoweare/index.html) : NEH is an 
independent grant-making agency of the United States 
government dedicated to supporting research, education, 
preservation, and public programs in the humanities. 


Social Sciences and Humanities Research Council of 
Canada (http://www.sshrc.ca/) : It reports on grant support, 
conferences, projects, and current research. 


American Council of Learned Societies (http:// 
www.acls.org/) : The American Council of Learned Societies 
is a private non-profit federation of sixty-six national scholarly 
organizations. The mission of the ACLS is “the advancement 
of humanistic studies in all fields of learning in the humanities 
and the social sciences and the maintenance and 
strengthening of relations among the national societies 
devoted to such studies.” The site includes about the ACLS ; 
Fellowship and Grant Programs; Other Program Activities; 
Constituent Learned Societies; Associates; Affiliates; ACLS 
Directory; ACLS Publications; and Online Scholarly 
Resources. 


The Association for Computers and the Humanities 
(http://www.ach.org/) : The Association for Computers and 
the Humanities is an international professional organization. 
Since its establishment, it has been the major professional 
society for people working in computer-aided research in 
literature and language studies, history, philosophy, and other 
humanities disciplines, and especially research involving the 
manipulation and analysis of textual materials. The ACH is 
devoted to disseminating information among its members 
about work in the field of humanities computing, as well as 
encouraging the development and dissemination of significant 
textual and linguistic resources and software for scholarly 
research. 


506 Manual of Digital Libraries 
NETWORKING ORGANIZATIONS 


Information about networking organizations can be 
found on the following sites. 


LTSN (http://www.ltsn.ac.uk/) : The Learning and 
Teaching Support Network (LTSN) is a major network consists 
of —a network of 24 subject centres, based in higher education 
institutions throughout the UK, offering subject-specific 
expertise and information on learning and teaching; a Generic 
Centre offering expertize and information on learning and 
teaching issues that cross subject boundaries; and an 
Executive, located with the Institute for Learning and Teaching 
(ILT), which manages and co-ordinates the network. 


It aims to promote high quality learning and teaching 
through the development and transfer of good practices in all 
subject disciplines, and to provide a ‘one-stop shop’ of learning 
and teaching resources and information for the Higher 
Education (HE ) community. It is funded by the four Higher 
Education funding bodies in England, Scotland, Wales and 
Northern Ireland. 


Two new JISC (Joint information Systems Committee) 
services, the Technologies Centre and the Technology for 
Disabilities Information Service, TechDis, are also co-located 
with the LTSN Generic Centre in York, and works closely 
with the LTSN. The Technologies Centre investigates the 
application of new technologies in HE and FE, and 7echDis 
aims to enhance access, through technology, to learning and 
teaching, research and administration for students and staff 
with disabilities in HE and FE. The LTSN has something to 
offer to all staff involved in learning and teaching in HE, 
including academic staff, senior managers, learning 
technologists, educational development staff, and staff 
developers. It offers you the opportunity to share your best 

ideas and develop them with like-minded colleagues in your 
own subject area. The LTSN will become the primary 


Internet and Internet Resources 507 


information and advice resource on learning and teaching 
matters for all academic and related staff in HE. For any 
information on current best practices, either subject-specific 
or generic, the LTSN is able to provide it. 


Coalition for Networked Information (CNI) (http:// 
www.ninch.cni.org/) : CNI is an organization dedicated to 
supporting the transformative promise of networked 
information technology for the advancement of scholarly 
communication and the enrichment of intellectual productivity. 
Some- 200 institutions representing higher education, 
publishing, network and telecommunications, information 
technology, and libraries and library organizations make up 
CNI’s Members. CNI and Dartmouth College announce the 
availability of a new web site, Collaborative Facilities, (http:// 
www.dartmouth.edu/~collab/index.html) designed to collect, 
organize, and disseminate information about model 
“collaborative facilities” on college and university campuses. 
Visitors may “tour” and analyze documents from facilities 
ranging from information commons to distance-education 
offices to centers that assist faculty in integrating teaching 
and new technology. 


RESEARCH ASSOCIATIONS 


These associations are mainly concerned with the 
research and development activities and primarily focus on 
research and developmental issues. Examples include: 


The Modern Humanities Research Association 
(MHRA) (http:// www.mhra.org.uk/index.html/) : It is 
encouraging and promoting advanced study in the modern 
humanities. The association aims to maintain the broader 
unity of humanistic scholarship in the face of increasing 
specialization, and attempts to fulfil this purpose especially 
through the publication of original work, including journals, 
bibliographies, monographs, and other aids to scholarly 


508 Manual of Digital Libraries 


research. The details of the association’s publishing and 
funding activities are available from these pages. This site 
offers several options to various resources that are displayed 
in the Menu of this site. 


PUBLISHERS, BOOKSELLERS AND LIBRARY 
CATALOGUES 


Publishers and book-sellers are the important 
constituents of a Library. Information on these along with 
Library catalouges may be seen on following sites. 


OCLC FirstSearch (http://www.oclc.org/services/) : 
OCLC WorldCat is the OCLC Online Union Catalogue. For 
nearly three decades, libraries have shared their catalogues 
electronically to create the world’s largest database of 
bibliographic information. WorldCat offers over 47 million 
bibliographic records—representing 400 languagues—and 
holdings information vital for collection development, 
cataloguing, authority control, and retrospective conversion 
services. Through the OCLC FirstSearch service, the users 
can access 70 databases including familiar names from 
leading information providers as well as resources provided 
exclusively by OCLC. OCLC databases include— WorldCat, 
ArticleFirst, Electronic Collections Online, PAIS International, 
PapersFirst, ProceedingsFirst, and the OCLC Union Lists of 
Periodicals. 


AAUP Online Catalog of University Press Publications 
(http:// aaup.pupress.princeton.edu/) : The Association of 
American University Presses Online Catalogue currently lists 
over 75,000 book titles from over 60 presses. AAUP enhances 
Library of Congress records with price, print status, and 
descriptive text for each item. The AAUP Catalogue can be 
searched several ways, including author, title, and standard 
Library of Congress subject keywords. 


Internet and Internet Resources 509 


UCSB Library’s Infosurf: Other Libraries. Other 
Campuses (http:// www.library.ucsb.edu/docs/other.html) : 
UCSB Library’s links to online catalogues from other local, 
national & international libraries. 

Women’s Publications, Publishers, & Bookstores 
(http://www.igc.org/igc/ issues/wnpub/#Open) : It resources 
from WomensNet. 

Beside, Yahoo! Links to the Publishing Industry on http:/ 


/www.yahoo.com/ Business and Economy/Companies/ 
Publishing/ also gives information about them. 


UNIVERSITY DEPARTMENTS 


University Departments also form an excellent source 
for information resources in various subject areas. This 
category has already been discussed under campus wide 
information systems/university departments in this chapter. 


HOW-TO KEEP UP-TO-DATE WITH NEW SOURCES IN 
ARTS & HUMANITIES 


It is important for academicians and researchers 
undertaking research to keep up-to-date with the latest 
literature in their field. The wealth of information available to 
us is enormous, making this a very time consuming process, 
therefore you need sources to help make this task easier. 
The sources listed below will keep you informed about new 
resources on the Internet. You can subscribe to some of these _ 
bulletins electronically to receive email notification of new 
resources. 

Subject directories select and organize internet 
resources into browsable directories. The following directories 
list resources of relevance to academics and tertiary students. 


HUMBUL: New Resources (http://www.hurnbul.ac.uk/ 


output/new.php?int=7) : It covers humanities sites of interest 
to scholars. 


510 Manual of Digital Libraries 


Infomine: Scholarly Internet Resource Collections 
(http://infomine.ucr.edu/) : Click on What’s New next to each 
subject category to view new additions to the directory. You 
can also join an A/ert Service, to receive email information of 
new sites as they are added to the directory. 


New Resource Alert Service lists new resources. The 
email notification service can keep you informed of new 
resources as they are added to INFOMINE. You can initially 
choose and later modify, one or more general subject areas 
of INFOMINE in which to receive alerts as well as the 
frequency with which they are received. 


Academic Info - Whats New (http:// 
www.academicinfo.net/new.html) : Sign Up mailto: 
madin@academicinfo.net to receive their monthly 
announcement list.. 


Northern Light (http://standard.northernlight.com/cgi- 
bin/cl_alert.pl?cb=200) : The Northern Light web search 
engine can be used to notify you of new resources in your 
area of interest. You can limit it to webpages or the Northern 
Light Special Collection articles. 


BUBL link Updates (http://bubl.ac.uk/link/updates/) : 
It is a catalogue of Internet resources covering all academic 


areas. It provides link to the new resources in the area of 
Humanities. Also facilitates the search to chronological listing 
of new resources. 


infoLinX news (http://www.wmin.ac.uk/library/images/ 
infonews) : infoLinX news aims to keep you up-to-date with 


new resources and developments. infoLinx, the new gateway 
to electronic resources provides integrated access to a wide 
range of electronic resources. Locating information within the 
ever increasing range of resources can be difficult. infoLinX 
provides a single point of entry, that will help you navigate 
through both internal and external sources of information, 
delivering relevant information to your desktop. 


Internet and Internet Resources 511 


Information Search-University of South Australia 


Library (http://www. library. unisa.edu.au/resmetli/keepingup/ 
keepingup.htm#internet) : it is used to keeping up to date 
with new information for long term research projects. It is 


necessary to develop strategies to keep up to date with new 
information as it becomes available. This site provide pointers 
to new resources. 


Besides, Scout Report (Attp-/wwwscout. Cs. WISC. eduh 
is the best source for obtaining such information in any subject 
area. 


5.9.2. Internet Resources in Social Sciences 


Social Sciences itself covers vast subject areas in its 
ambit. Broadly the main subject areas of Social Sciences are: 
General Statistics; Political Science; Economics; Law; Public 
Administration; Social Services; Association; Education; 
Commerce, Communications, Transport; Customs, Etiquette, 
Folklore; Geography and the History. Besides, many new 
emerging areas as well as multidisciplinary areas evolving in 
the process also form part of Social Sciences. A variety of 
different kinds of resources are available in all these subject 
areas on the Internet. However depth and quantum may vary 
from area to area. 


This list by no means is a comprehensive list which is 
neither possible nor feasible. This is due to the fact that every 
hour new resources are being added to the existing pool of 
Internet resources and some sites vanish as well. Therefore 
any list is liable to change. Hence, an attempt has been made 
to give an exposure to type of resources available in the area 
of social sciences including suitable examples. The resources 
have been grouped under various types based on the nature 
of the sources. Since various resources have multifaceted 
features, therefore overlapping of resources is inevitable. 
Now, we shall take a stock of resources in the social sciences: 


512 Manual of Digital Libraries 
e By Source Type 
e By Category/Resource Types 


e By Organization 


(A) By Source Type 

There are several types of sources available such as: 

e Electronic Journals 

e Current Contents of Journals 

e Directories of all types of WWW E-Journals 

e Online Indexes of Print or Electronic Journals 

e Current Contents of Journals 

e Discussion Lists or Forums/Usenet Newsgroups/ 
Mailing Lists 

e Preprints and Working Papers 

e Software Archives 

e Data Archives 

e Subject Databases 

e Statistical Packages 

e Research Projects. 

e Data Centres 


e Campus Wide Information Systems (CWIS)/University 
Departments 


e Document Delivery 
e Reference Sources 


° Courseware Directories/Teaching Resources/Training 
Materials 


e Virtual and Remote Experimental Laboratories 
e Directories 


Internet and Internet Resources 513 


° Online Documents 


ELECTRONIC JOURNALS 


Journals which can be accessed over the Internet are 
appearing on the Internet with increasing frequency. More 
and more journals are being made available in e-format on 
Internet. There are number of sites that provides access to 
e-journal resources in the social sciences. Examples include: 


World Wide Web Virtual Library - Social Sciences 
(hitp://v!lib.org/) : It includes over 100 journals on various 
areas of Social Sciences. The index lists journals in 
alphabetical order and search can be initiated by Clicking on 
letter to jump to the appropriate section. Besides directories 
of all types of WWW E-Journals and other online indexes of 
print or electronic journals are also part of this site. The social 
sciences are listed under Society at the bottom of the home 
page ofthis site. On clicking on the concerned link it shows a 
pool of resources in the area. Sites of journals included in the 
Social Sciences. World Wide Web Virtual Library can not be 
reproduced, as such since it is a long list. However a few 
examples are being included here for reference purposes after 
the home page. 


e Asian Studies - E-Journals Register (U.Koln, 
Germany) available at http://www.uni-koeln.de/phil-fak/ 
indologie/AsianE-Journals.html. 


e Canadian Journal of Sociology (U. Alberta, Canada) 
at http:/Awww.uaiberta.ca/-ciscoDv/cis.html. 


e Chronicon: An Electronic History Journal (University 
College, Cork, Ireland) available at http://www.ucc.ie/ 
chronicon/. 


e Electronic Austraiian Journal of Management 
(University of New South Wales, Australia) at http:// 


www.agsm.unsw.edu.au/"eajm/: Research in applied 


514 


Manual of Digital Libraries 


economics, finance, industrial relations, political 
science, psychology, statistics, and other disciplines, 
is provided with the application in to management, as 
well as research in areas such as marketing, corporate 
strategy, operations management, organization 
development, decision analysis and other problem- 
focused paradigms. 


E-law: Murdoch University Electronic Journal of Law 
(Murdoch Univ., Australia) at http:// 
www.murdoch.edu.au/elaw/ : It is a general journal of 
law and legal issues published by the Murdoch 
University School of Law. It contains articles in four 
categories: Net Watch, Current Developments, 
Materials for Comment / Works in Progress, and 
Refereed Articles. 


EURODATA Newsletter at http://www.mzes.uni- 
mannheim.de/eurodata/ newsletter/newsletter.htm! It 
is published by the EURODATA Research Archive of 
the Mannheim Centre for European Social Research 
(MZES). It contains twice yearly information for social 
scientists concerned with comparative research on 
Europe. 


History Reviews On-Line (Univ. Cincinnati, USA) at 


http://blues.fdl.uc.edu/ wwwihistory/reviews.html: It is 
electronic quarterly devoted to reviewing books on all 
fields of history. 


Internet Resources Newsletter (Heriot-Watt University 
UK) at http:// www.hw.ac.uk/libWWW/irn/irn.Atm!. 


Latitudes, The McGill Journal of Developing Areas 
Studies (McGill University Canada) at http:// 
vub.mcgill.ca/journals/latitudes/ : The journal increases 
awareness and stimulates more informed opinions 
regarding the interdependency of developing and 


Internet and Internet Resources 515 


developed worlds by opening discourse on issues 
related to developing areas. 


e MERGER Newsletter (Utrecht University, Netherlands) 
at http://www.ercomer.org/ merger/index.html : It is the 
newsletter of the Migration and Ethnic Relations Group 
for European Research. It is published three times a 
year by the European Research Centre on Migration 
and Ethnic Relations (ERCOMER) in the Netherlands. 


° Policy and Research Report (USA) at http:// 
www.urban.org/periodcl/prr.htm : The Report (ISSN 
074-8485) is published several times a year by the 
Urban Institute, Washington, D.C. The report 
summarizes many of the activities and some of the 
research of the Institute. 


e Political Science Quarterly (PSO) (USA) at http:// 
epn.org/ psq.html : It is scholarly, nonpartisan journal 
on government, politics, and public policy, both 
international and domestic. 


e Polyphony - Newsletter of the Centre for Immigration 
and Multicultural Studies (Australian National 
University, Australia) at http://coombs.anu.edu.au/ 
SpecialProj/CIMS/Polyphony/Polyphony.html. 

° Postmodern Culture Journal (University Virginia, USA) 
at http:// jefferson.village.virginia.edu/pmc. 

e Qualitative Report (Nova Southeastern University, 
USA) at http://www.nova.edu/ ssss/QR/index.htm! : It 
is an on-line journal devoted to writing and discussion 
of and about qualitative research and critical inquiry. 


e Social Research Update (University Surrey, UK) at 


http://www.soc.surrey.ac.uk/sru/sru.html : It is a 
quarterly e-journal which covers new developments in 


social research, one per issue. 


516 Manual of Digital Libraries 


e Sociological Research Online University Surrey, UK) 
at http:// www.socresonline.org.uk/socresonline/ : It is 
an e-journal which promotes rapid communication 
among sociologists. 


e Stanford Journal of Law, Business & Finance (Stanford 
University, USA) at http://www.stanford.edu/group/silbf. 


Academic Press (http://www.apnet.com/www.ap/ 
aboutap.htm) : The publisher gives full-text journal in Social 
and Behavioural Sciences and Economics of over 170 
journals. The areas in which the journals are available from 
this publisher are given below. 


Archaeology and Anthropology; Economics; Education 
Research; Finance and Business; Gerontology and 
Geriatrics; Information and Library Science; Psychology 
and Related Behavioral Sciences; Sociology, Statistics 
and Social Sciences; Speech, Language, Hearing, and 
Audiology. 


Register of Leading Social Sciences E-Journals (htip:/ 
/www.Cclas.ufl.edu/ users/gthursby/socsci/ejournal.html) : It 
keeps track of on-line serials of significance to researchers 
in Social Sciences and Humanities. This site is a part of the 
Social Sciences Virtual Library which was established at the 
Australian National University. 


ECONbase- Elsevier Science (hitp:// 


www.elsevier.com/homepage/sae/_econworld/menu.htm) 
Or (http:// www.elsevier.nl/) : The sites acts as an access 


point to economics journals. Currently the resource-base 
provides access to 60,000+ online papers. Online access to 
full text articles in ECONbase is available to those readers 
whose library is either registered with ScienceDirect® Web 
editions or subscribes to ScienceDirect Digital Collections. 
In all cases, access is restricted to those journals to which 
the library holds a current subscription. Except for online 


Internet and Internet Resources 517 


access to full text, there are no restrictions on access to 
information within ECONbase. Available functionality 
includes— browsing tables of content, searching titles, authors, 
abstracts, key-words in a database of journals. Besides this, 
provision is also made available for author and keyword 
indexes for each individual journal, viewing abstracts for all 
ECONbase journals, selected full text of a limited number of 
journals and a free sample copy of each journal. 


PSYCLINE (http://www.psycline.org/journals/ 
psycline.himl) :PSYCLINE is owned and managed by 
psychologist Dr. Armin Gunther, University of Augsburg, 
Germany. The website started in 1995 under its former name 
Links to Psychological Journals and has won a high reputation 
as one of the most comprehensive and up-to-date index of 
psychology and social science journals on the web. It covers 
over 1500 journals in the area of Psychology and Social 
Sciences with free table of contents and abstracts. In 2001 
Links to Psychological Journals was renamed into 
PSYCLINE— Your Guide to Psychological and Social Science 
Journals on the Web. The home page of the Psycline is 
reproduced below in Fig. 5.1. 


Web Journal of Current Legal Issues (http:// 
webjcli.ncl.ac.uk/index.html) : The Web Journal of Current 
Legal Issues is published bi-monthly on the World Wide Web. 
The focus of the Journal is on current legal issues in judicial 
decisions, law reform, legislation, legal research, policy 
related socio-legal research, legal information, information 
technology and practice. Contributions to the Articles, 
Comments, Case Notes, Legal Education and Information 
Technology sections are refereed. The Journal was 
associated with Blackstone Press Ltd from its inception until 
2001. Full text archive of the journal is available. 


518 Manual of Digital Libraries 


PSC sae 


Prychalesy and Socid Soene 
Javak on the Wed 
ricie Locator i 


E fEPSYCLINE 
Search Journals $ Your Guide to Psychology and Social Science Journals on the Web | 


{ 
k NE": . An index of more than 2,000 psychology $ 
D by kaywordis): Š PSYCLINE’s Journal Locator: An index of more than 2,000 osythoio j 


and sonal science journals on the web 


le Locator: An easy to use interface to articie 
databases in psychology. 


Promote your journals with us! 


IPSYCUNE now offers excellent 
ing opportunities for journal 
shers and others 


Promote your journals ; 
with 


£ Jone Intereste ed? Fell free to contact us for 
= Psychology ard Socid Scierce t 
5 Jourvak aa the Web ithe de tailst Mail wane ing.org. E 


Fig. § 5.1. Home age. of PSYCLine 


DIRECTORIES OF ALL TYPES OF WWW E-JOURNALS 


Social Sciences World Wide Web Virtual Library (Attp./ 
/lib.org/ also includes directories of all types of WWW E- 
Journals which is an excellent source for starting search while 
you are looking for e-journals in the area of Social Sciences. 
Examples include: 


Australian Journals Online (Australia) (http:// 
www.nla.gov.au/oz/ ausejour.html) : It is a current listing of 
over 1,000 Australian electronic journals, magazines, 
webzines, e-mail fanzines, etc. - including overseas works 
with Australian content, authorship and/or emphasis. 


Directories of Electronic Journals (http:// 
gort.ucsd.edu/ejourn/jdir.html) : It is a keyword searchable 
database and has extensive links to other resource facilities. 
It is supported by the University of California at San Diego. 


e-zine-list (http://www.meer.net/~johnl/e-zine-list/) : It 
is a directory of electronic journals and magazines, accessible 


via the Web, Gopher, FTP, email, or other services. The list 


Internet and Internet Resources 519 


is updated approximately monthly. Alphabetical by title, can 
be browsed by keyword. 


Journals and Newspapers (Carnegie Mellon University, 
USA) (http://eserver.org/ journals/) : It is a short alphabetical 
list of online journals, with links to other directories. 


New Jour (http://gort.ucsd.edu/newjour/) : It is an 
archive for a major list of electronic journals and newsletters 
available on the Internet. 


Scholarly Journals Distributed Via the World Wide 
Web (University Houston, USA) (http://info.lib.uh.edu/wj/ 
webjour.html) : It is an alphabetically organized directory that 
provides links to established web-based scholarly journals 
that offer access to English language article files without 
requiring user registration or fees. 


ONLINE INDEXES OF PRINT OR ELECTRONIC 
JOURNALS 


Social Sciences World Wide Web Virtual Library also 
includes other online indexes of print or electronic journals 
for reference purposes. Examples include: 


Anthropological Index Online (http://lucy.ukc.ac.uk/cgi- 
bin/uncgi/Search Al/ search bib_ai/anthind) : It is an index 
to current periodicals in the Museum of Mankind Library. 
Currently the years from 1970 onward are available. It is 
searchable by year, subject area, author, title, or journal. 


Electronic Journals related to Learning Technologies 


(http://olt-bta.hrdc-drhe.gc.ca/info/eljoue.html) : It is a 
hyperlinked list provided by Information Place. 


CURRENT CONTENTS OF JOURNALS 


In order to keep abreast of new developments in a 
particular topic or issue, one of the means or services that 
people use is current contents listings of the journals. Because 


520 Manual of Digital Libraries 


of the e-publishing and with the advent of Internet, these 
current contents of journals have become current in real time 
situations. Not many journals have full-text available on the 
web, therefore getting the full text of articles is not always 
possible over the Internet. Hence many academic publishers 
have designed Web Pages for each of their journals, wherein 
they list the current contents of their latest edition of the 
journal, sometimes with abstracts. So a regular scan of 
relevant journal web pages can heip to keep you up-to-date 
with new articles as they appear. Almost all the journals 
available in e-format provide access to table of contents for 
free. 


Academic Press Journals (hitp://www.apnet.com/ 
journals) : Now part of Elsevier Science, it provides links to 
the latest Table of Contents, and subscription information for 
all 174 Academic Press Journals in the area of social and 
‘behavioural sciences. 


International Social Science Journal (hitp:// 
www. blackwellpublishers.co.uk/ asp/iournal.asp?ref=0020- 
8701) : Itis an UNESCO publication- aims to be international 
in scope and authorship bridging the communities of social 
scientists between disciplines and between different parts of 
the globe. Being interdisciplinary and international, diffusing 
information and debate to the widest possible audience. 
Lately, full-text online version of recent issues is also being 
made available for institutions that subscribe to the hard copy. 


American Journal of Political Science (http:// 
www. library.iisc.ernet.in/access/ wklstper/newjournal.html): 
It is published quarterly by the University of Wisconsin Press, 
facilitating access to table of contents and abstracts from 1995 
onwards. Besides, the site also contains research articles 
and book reviews covering all fields of Political Science. 


Internet and Internet Resources 521 


DISCUSSION LISTS OR FORUMS/USENET 
NEWSGROUPS/MAILING LISTS DISCUSSION LISTS, 
USENET NEWSGROUPS, MAILING LISTS are used almost 
in the same connotation. The fact is that the Internet is 
interactive and offers new channels for scholarly discourse 
and new sources of information based on archives of this 
discourse. One can choose to communicate with people or 
simply to observe other’s communications. There are a large 
number of scholarly discussion groups available and some 
possess archives of all the messages posted to them which 
can be often searched by keyword. Examples include: 


Mailbase (http://www.mailbase.ac.uk/lists.html) : It 
provides access to over 2,000 electronic discussions lists for 


the UK higher education and research community. The site 
provides education descriptions, message archives and 
subscription information for numerous mailing lists relevant 
to education. The site also includes information about how to 
join any of these lists. 


The SOSIG Mailing List (http://www.mailbase.ac.uk/ 
lists/sosig/) : It has over 400 members from the world wide 
social science community and distributes messages about 
among other things, new Internet sites and services for social 
scientists. 


The Ecol-Econ Mailing List (http://csf.colorado.edu/ 
ecolecon/index.html) : This site provides for the discussion 
around alternatives to the prevailing economic paradigms and 
such questions as sustainability, the role of economic growth, 
free trade and the environment, and the role of multilateral 
economic institutions in the sustainability of the development 
process. The site also provides the full text of essays from 
some of the members of the mailing list and some links to 
related sources of information. 


Cti-law (http:/Awww.mailbase.ac.uk/lists-a-e/cti-law/a) : 
It is a discussion group of lawyers, especially those interested 


522 Manual of Digital Libraries 


in the use of information technology within legal teaching, 
including discussion of LEXIS/NEXIS (usage, cost, 
contractual arrangements, and the like) and other sources of 
legal information. 


DIRECTORIES OF NEWSGROUPS AND MAILING LISTS 


Deja.com (http://www.deja.com) : It searches a vast 
number of discussion forums and Usenet groups, including 
archives of previous postings. The new Deja.com now also 
aims to serve as an Internet consumer guide, with ratings of 
products. While Deja Tracker informs you by e-mail about 
new postings in your favourite newsgroups. 


ForumOne (http://www.forumone.com) : It is a speciality 
search engine which helps you locate messages posted on 
over 270,000 Web discussion forums. 


Liszt (http://www.liszt.com) : It is also a Searchable 
database of over 90,000 mailing lists/ discussion groups. 


Neosoft (http://www.neosoft.com/internet/paml) : It ia 
a huge directory of publicly accessible mailing lists and Usenet 


news groups, etc., with details about traffic and how to join. 


Usenet Groups (ftp://rtfm.mit.edu/pub/usenet-by- 
hierarchy) : It is a directory to Usenet groups by hierarchy. 


PREPRINTS AND WORKING PAPERS 


The Internet is increasingly being used by academics 
to publish the full text of conference papers, draft papers or 
work-in-progress, and other similar material, often to facilitate 
peer review process. Examples include: 


London Business School, Centre for Marketing 
Working Papers (http:// www.!|bs.ac.uk/marketing/Working 


Papers working papers.html) : It provides a list of working 
papers from the Centre for Marketing at the London Business 


School. Available from 1995, it includes abstracts for all the 


Internet and Internet Resources 523 


papers and full text for most of the papers in Adobe Acrobat 
(PDF) format. 


WoPEc (hittp://netec.mcc.ac.uk/WoPEc.html) : It is an 
international effort to collect together and make available 


working papers in the area of economics from academics 
from world over. The database is continually growing and 
contains thousands of working papers from hundreds of 
series. WoPEc is part of a larger project called NetEC which 
is an international academic project for networking interactions 
in economics. 


Working Papers in Psychology (http:// 
WWWw.cogs.Susx.ac.uk/cgi-bin/htmlcogsreps?wpip) : It gives 
a list of papers written by staff at the school of Cognitive and 
Computing Sciences at the University of Sussex. Includes 
abstracts of the papers and in some case full text as well that 
can be accessed via FTP over the Internet 


SOFTWARE ARCHIVES 


The Internet offers access to numerous shareware 
software packages. Examples include: 


Software and Datasets for Sociology and Demography 
(http:// www.stat.washington.edu/raftery/Research/Soc/soc 


software.html) : There are several softwares and datasets 
available from this site. 


Software Resources on the Internet (http:// 


psych.hanover.edu/Krantz/_software.html) : There are 
number of softwares available as freeware or shareware in 


various areas of Social Sciences. The site provides links to 
various such software resources that can be even 


downloaded. 


DATA ARCHIVES 
There are large number of data archives available on 


524 Manual of Digital Libraries 


the Internet in the area of social sciences. Examples include: 


Social Science Hub (http://www.sshub.com/ 
index.html) : Social Science Hub covers resources for 
Anthropology, Sociology and Archaeology and other 
associated disciplines. It provides links to data archives, 
Websites, newsgroups, news, research tools and 
publications. 


Guide to Primary Social Science Research Data and 
Related Resources available on the Internet (http:// 
www.chass.utoronto.ca/datalib/other/) : It is a guide to data 
libraries, data archives, and related institutions about which 
information is available via the Internet, as well as to primary 
research data and related resources available for access or 
acquisition via TCP/IP-based tools. Also the literature is 
available on data management. The guide includes 
quantitative or numeric data, as well as textual resources. It 
is a major resource guide to provide links to Directories of 
data archives and data libraries; Individual data archives, data 
libraries, and related institutions, beside other useful 
resources in social sciences. Broadly it covers the following: 


(1) Directories of data archives and data libraries, (2) 
Indices, union catalogues, and resource guides to research 
data files, (3) Professional data associations, (4) Individual 
data archives, data libraries, and related institutions, (5) Data 
producers, (6) Other data resources, (7) Statistics and 
software, documentation and sources, (8) Text archives, (9) 
Selected text resources, (10) Data-related conferences, (11) 
Data-related training opportunities, (12) Electronic journals, 
(13) Electronic listservs. 


Clickable Map of Major Social Science Data Archives 


(http:// www.nsd.uib.no/cessda/europe.html) : It provides 


links to various social science data archives of Europe. 
The Data and Program Library Service (DPLS) (http:/ 
/dpls.dacc.wisc.edu/) : It is the central repository of data 


Internet and Internet Resources 525 


collections used by the social science research community 
at the University of Wisconsin, Madison. Its mission is to 
promote academic research by facilitating the use of 
secondary research materials. To fulfill this mission the library 
acquires, preserves and facilitates access to social science 
data resources, provides reference and technical services to 
researchers, and assists in the archiving of locally produced 
data. The DPLS is part of the Data and Computation Center 
(DACC) and is subject to its policies. A faculty advisory 

committee works with the Director of DACC and DPLS staff, 

to oversee policy. 


The data library’s current holdings mainly support social 
science research employing statistical methods on numeric 
data. These holdings include several thousand studies. The 
majority of these studies were obtained through the University 
of Wisconsin-Madison membership in the Inter-university 
Consortium of Political and Social Research (ICPSR). They 
also acquire data from inter-governmental organizations, U.S. 
statistical agencies, and other data sources. Their collection 
spans a wide range of topics including historical and 
contemporary population characteristics, community and 
urban studies, intra-and international conflict, economic 
behaviour and attitudes, education, mass political behaviour 
and attitudes, and social institutions and behaviors. The data 
library facilitates access to its data collection via computer- 
based information resources such*as the world wide web, 
through teaching classes both in house and in the classroom, 
and in one-on-one situations. The goal is to provide access 
to data as quickly and efficiently as possible while at the same 
time meeting the individual needs of the user. Whenever 
possible and relevant, users are provided with electronic 
versions of study documentation and input syntax for SPSS 
or SAS. 


The South African Data Archives (http:// 
www.nrf.ac.za/sada/index.asp) : The South African Data 


526 Manual of Digital Libraries 


Archive (SADA) serves as a broker between a range of data 
providers (e.g. statistical agencies, government departments, 
opinion and market research companies and academic 
institutions) and the research community. The archive does 
not only preserve data for future use, but also adds value to 
the collections. It safeguards datasets and related 
documentation and attempts to make it as easily accessible 
as possible for research and educational purposes. 


Social Science Data Archive (Australia) (http:// 
ssda.anu.edu.au/) : The Social Science Data Archives 
(SSDA), located in the Research School of Social Sciences 
at the Australian National University, was set up in 1981 with 
a brief to collect and preserve computer-readable data relating 
to social, political and economic affairs and to make the data 
available for further analysis. 


Regard (ESRC) (http://www.regard.ac.uk/regard/ 
home/index_html!?) : It is the free database service of the 


UK Economic and Social Research Council. Regard offers 
access to information on UK economic and social research 
since the mid-1980s, including abstracts, details of 
publications and for the most recent projects, full text research 
findings reports. It covers over 70000 records and over 6000 
research awards in sociology, politics, economics, 
anthropology, management, human geography, psychology, 
social history, linguistics and social policy. 


The UK Data Archive (UKDA) (http://www.data- 
archive.ac.uk/) : The UK Data Archive (UKDA) is a resource 
centre that acquires, disseminates, preserves, and promotes 
the largest collection of digital data in the social sciences and 
humanities in the United Kingdom. Its primary aim is to support 
secondary use of quantitative and qualitative data for research 
and learning. The UKDA also houses two specialist units— 
the History Data Service (HDS) (Attp:/hd's.essex.ac. ukj and 
Qualidata - Qualitative Data Service (http.// www. qualidata. 


Internet and Internet Resources 527 


essex.ac.uk/)), and provides access to international data 
through cooperative agreements and memberships with 
archives around the world. 


SUBJECT DATABASES 


Database is a collection of records, each with details of 
a different data item, whether numeric, textual or image-based 
but usually searchable. There are thousands of database 
resources available on the Internet. In Social Sciences a large 
number of databases are available. Examples include: 


UNESCO Social Science Database - DARE :Directory 
in Social Sciences Institutions, Specialists, Periodicals (http:/ 
/www.unesco.org/most/ dare.htm) : The DARE Database 
offers over 11,000 worldwide references to social science 
research and training institutes; social sciences specialists; 
social science documentation and information services; social 
science periodicals. The database also contains special 
references to peace, human rights and international law 
research institutes. 


ERIC-Educational Resourees Information Center 
(http://www.eric.ed.gov/) : The ERIC database is the world’s 
largest source of education information. The database 
contains more than one million abstracts of education-related 
documents and journal articles. You can access ERIC 
database on the Internet or through commercial vendors and 
public networks. You can also access ERIC abstracts in the 
print publications Resources in Educationand Current Index 
to Journals in Education. By searching ERIC database, you 
will retrieve citations and abstracts for education-related 
literature relevant to your search topic. These citations and 
abstracts are called “resumes.” There are two types of 
resumes in the ERIC database: ERIC Documents (ED) and 
ERIC Journal Articles (EJ). 


528 Manual of Digital Libraries 


Population Index on the Web (http:// 
popindex.princeton.edu/) : Population Index is the primary 
reference tool to the world’s population literature. It presents 
an annotated bibliography of recently published books, journal 
articles, working papers, and other materials on population 
topics. 


The SSRN (Social Science Research Network) (http:/ 
iwww.ssrn.com/ index.html) : Social Science Research 
Network (SSRN) is devoted to the rapid worldwide 
dissemination of social science research and is composed of 
a number of specialized research networks in each of the 
social sciences. Each of SSRN’s networks encourages the 
early distribution of research results by publishing Submitted 
abstracts and by soliciting abstracts of top quality research 
papers around the world. There are now hundreds of journals, 
publishers, and institutions in Partners in Publishing and in 
academic and other Cooperating /nstitutions that provide 
working papers for distribution through SSRN's e-Libraryand 
abstracts for publication in SSRN’s electronic journals. SSRN 
e-Library consists of two parts— an Abstract Database 
containing abstracts on over 42,600 scholarly working papers 
and forthcoming papers and an Electronic Paper Collection 
currently containing over 23,600 downloadable full text 
documents in Adobe Acrobat pdf format. The e-Library also 
includes the research papers of a number of Fee Based 
Partner Publications. The Networks encourage readers to 
communicate directly with authors and other subscribers - 
concerning their own and others’ research. SSRN’s database 
abstracts of some journals for both accepted paper series 
and working paper series. 


STATISTICAL PACKAGES 


Some computer programmers have made their 
statistics programs freely available online via Internet. 
‘Examples include: 


Internet and Internet Resources 529 


Two-sample Calculator (http://www.stat.ucla.edu/ 
calculators/twosamp/) : It is also sometimes referred to as t- 
test calculator. This is an online calculator which allows you 
to enter your raw data via a Web form and then automatically 
conducts the t-test for you. You need to input two sets of 
numbers, it then gives a one-sided or two-sided t-test for 
paired or independent samples. It can either lookup the P 
value for t, or it can compute the P value using the 
randomization distribution. 


Statistical Data Locators (http://www.ntu.edu.sg/ 
library/stat/statdata.htm) : The resource provides an 
enormous amount of statistical data, that is relevant to social 
scientists. Based on geographic location-wise, the areas 
covered are - Asia; Oceania; North America; Europe; Africa; 
International; Latin America; and Others. Besides, statistical 
data locators, Financial Data Locators; Subject Guides and 
bibliographies can also be navigated from here. The resource 
is originated from Nanyang Technological University, 
Singapore. 


RESEARCH PROJECTS 


Research projects are an integral component of a 
resource-base in any subject area. There are number of sites 
available via Internet which lays emphasis on the details of 
the research projects in social sciences. Examples include: 


Education-line (http://www.leeds.ac.uk/educol/) : 


Education-line is a freely accessible database of the full text 
of conference papers, working papers and electronic literature 
which supports educational research, policy and practice. 


Social Science Research Resources (http:// 
socsci.colorado.edu/POLSCI/RES/research.html) : The site 
provides pool of resources under the following four broad 
categories in various areas of social sciences. (1) Citing 
Internet Resources; (2) Online Courses and Guides; (3) 


530 Manual of Digital Libraries 


Summer Programs in Data Analysis; (4) Social Science 
Research Centers. 


Age of Asia: Resources for Research (http:// 
www.lib.duke.edu/ias/eac/ILE4/) : The information from the 
Age of Asia site has now been incorporated in the following 
sites and hence can be accessed from these two links: 


e South Asia Resources: at http://www.lib.duke.edu/ias/ 
sasia/. 


e Southeast Asia Resources: at http://www.lib.duke.edu/ 
ias/SEAsia/. 


The sites facilitate access to research resources on 
South Asia and Southeast Asia. 


DATA CENTRES 


Data Centres are the primarily sites comprising of 
databanks that provide actual statistical data related to the 
subject. These sources are of utmost importance in areas 
where statistical data is the key input. Examples include: 


Data Analysis in the Social Sciences (http:// 
uts.cc.utexas.edu/“fackler/ data.html) : The site facilitates 
access to enormous information on Data Repositories; Data 
Analysis Resources; Text Processing, Software, and Archival 
Resources. 


Social Science Data Centre (http:// 
stauffer.queensu.ca/webdoc/ssdc/key.htm) : The data 
centre includes information on the Survey Files - covering 
Statistics on Canada and other surveys; Public Opinion Polls 
- featuring surveys from Decima, Environics, etc.; Aggregate 
Data - providing access to data as tables and time series; 
Geographic Files - facilitating access to digital mapping 
resources and other sources of data. 


Internet and Internet Resources 531 


CAMPUS WIDE INFORMATION SYSTEMS (CWIS)/ 
UNIVERSITY DEPARTMENTS 


Campus Wide Information Systems are becoming an 
important resource base on the Internet for world-wide 
university campuses that are available online. Such resources 
provide an in-depth information about the desired campus- 
whether about academics, resources, course curriculum, 
library catalogues, databases and other library resources, 
campus accommodation, tuition fee, scholarships and the like. 
Besides the individual university campus wide information 
systems, most of the universities have web pages for each 
department that provide contact details of the respective 
faculty and staff. There are a number of sites that aim to help 
you find web pages for particular University and also particular 
department of a university. Examples include: 


Campus Wide Information Systems (CWIS) Using 
WWW (http://www.hcc.hawaii.edu/hccinfo/cwis.html) : The 
following list represents the educational sites that are using 
WWW as a front-end for a Campus Wide Information System. 
The site gives country-wise list of Universities whose CWIS 
resources are available online. This is a one stop shop for 
resources on Higher Education CWIS reference points. 
Though this site is not a complete list of all the universities 
whose information is available on the Net, yet it covers many 
of them. The site also provides links to such sites that hold 
further links within a country for such resources. 


Economics Departments, Institutes and Research 
Centers (EDIRC ) in the World (http://ideas.uqam.ca/ 
EDIRC/index.html) : There are now an amazing number of 
economics institutions on the WWW and most of them have 
been indexed here. Currently 6449 institutions in 213 
countries and territories are listed, and new ones are being 
added. The index is organized by countries and fields, to avoid 
too long a download. Included are economics departments, 


532 Manual of Digital Libraries 


research centers and institutes in universities, as well as 
finance ministries, statistical offices, central banks, think 
tanks, and other non-profit institutions where mainly 
economists are working. 


Geography Departments Worldwide (http:// 
geowww.uibk.ac.at/geolinks/) : It is a searchable database 
of Geography Departments around the world. There are links 
to 955 Departments in 83 Countries of which 741 have already 
signed the “Add Department Form” and thus can be searched 
by Country, Keyword and Research fields. Reproduction and 
distribution are permissible for non-profit purposes only, but 
no changes are to be made to these documents without the 
author’s written consent. 


DOCUMENT DELIVERY 


Document Delivery Services are an important 
component of the Library’s inter library loan activity by which 
users are facilitated not only to obtain information about the 
location of a document but about the document itself as well. 
There are several such initiatives available on the WWW that 
facilitate document delivery services in online mode that were 
earlier handled by the traditional manual means. Examples 
include: 


British Library Document Supply Centre (BLDSC) 
(http://www.bl.uk/ services/document.html) : British Library 
Document Supply Centre is a single largest source for all 
your document needs. BLDSC is the leading document 
provider in the world. A rapid and comprehensive document 
supply and interlibrary loan service from BLDSC’s extensive 
collections to researchers and scholars in all kinds of libraries 
and organizations is being entertained by this Centre. The 
British Library has a large Document Supply Centre dedicated 
to the supply of copies of journal articles, books and other 
materials. You can register for a range of services for UK 


Internet and Internet Resources 533 


and overseas customers. If you want to order a copy of a 
particular journal article or conference paper straightaway, 
the Articles Direct service is probably what you are looking 
for inside web provides Options for searching for relevant 
journal articles and conference papers as well as ordering 
them over the web. Facilities for several other services for 
more specialist copying of library materials are also included 
at this site. 

TUG Interlibrary Loan and Document Delivery Service 
(http:// tug.lib. uoguelph.ca/illdd/) : This service is available 
to registered faculty members, students and staff of the 
University of Guelph (UG), University of Waterloo (UW), and 
Wilfrid Laurier University (WLU) which together form the 
TriUniversity Group of Libraries (TUG). Alumni, Friends of 
the Library and community users can obtain inter library loan 
(ILL) service from their local public library. It offers ILL service 
to corporate and community users on a fee basis. 

UTL John P. Robarts Research Library (http:// 
www.library.utoronto.ca/robarts/robarts.htm) : John P. 
Robarts Research Library contains the largest single 
collection of the University of Toronto Library system. The 
primary focus of the collection is in the social sciences and 
humanities. This library web site lists provides links to libraries, 
departments and services located in the Robarts Library 
building. The goal of the Resource Sharing Department 
(Document Delivery and Interlibrary Loan Services) is to 
facilitate the sharing of library materials. The department 
offers document delivery and interlibrary loan services geared 
to the timely delivery of material in both physical and electronic 
formats by means of either local or remote delivery systems. 


REFERENCE SOURCES 


There are large collection reference sources of varying 
scopes and forms available on the Net for reference purposes. 
Examples include: 


534 Manual of Digital Libraries 


Yahoo Reference (http://dir.yahoo.com/Reference/) : 
Yahoo! Reference covers- dictionary, encyclopedia, 


quotations, and world facts. 


Quick Reference Sources (http://www.mtsu.edu/ 
“library/vref.html) : Quick Reference Sources provides link 
to Internet Search Tools such as AltaVista ; Best Search 
Tools; Dogpile ; Google ; HotBot; Go.com; NetFirst : Signpost; 
Yahoo. Besides it also includes and provides link to Quick 
Reference by Topic such as - Almanacs; Associations; 
Biographical Sources; Book Reviews; Books in Print; 
Bookstores; Careers; Children’s Literature; Cities; Colleges; 
Consumer Information; Dictionaries; Digital Collections; 
Disabilities; Elect. Discussion Lists; Electronic Journals; 
Financial Aid; Genealogy; Grants; Job Listings; Libraries; 
Library Terms; Maps, and much more. 


World Fact Book (http://www.odci.gov/cia/ 
publications/nsolo/wfb-all.htm) : It is revised and updated 
every year, and compiled by the CIA. This is a good site for 
basic factual information on all the countries of the world— 
geography, people, government, economy, communications, 
transportation and military. It also provides access to various 
maps. 


The World Wide Gazetteer (http://www.c- 
allen.dircon.co.uk) : This site offers a substantial amount of 
practical information from country fact-files, drawing on a huge 
database, with links to maps, sources of information on the 
Web, current affairs, government and the economy, travel 
and flight data, and other resources. 


InfoNation (http://www.un.org/Pubs/CyberSchoolBus/ 


infonation/e_icount.htm) : It is a service from the United 
Nations. This two-step database allows you to view and 
compare the most up-to-date statistical data for the member 
states of the UN. You can select up to seven countries from 
the list at any one time, and thereafter choose up to four data 


Internet and Internet Resources 535 


fields set out under four major headings -geography, 
economy, population, social indicators - each with a range of 
subheadings. For example, under ‘Social indicators’ are 
illiteracy rate, school enrolment, spending on education, 
newspaper circulation, etc. 


Information Please (http://www.infoplease.com) : It is 
a large, free reference site for searching facts and statistics, 
and finding answers in almanacs, encyclopaedias and 
dictionaries. 


Internet Public Library (http://ipl.sils.umich.edu) : 
Developed by librarians, this directory contains a very large 
number of links to general reference as well as literary 
resources. 


COURSEWARE DIRECTORIES / TEACHING 
RESOURCES / TRAINING MATERIALS 


Such resources are the hub of courseware material, 
teaching resources and training materials. Teachers in social 
sciences are now using Internet as a source of teaching 
materials. A number of sites exist on the Internet that are of 
immense educational value. Examples include: 


World Lecture Hall (WLH) (http://www.utexas.edu/ 
world/lecture/index.html) : It provides an entry point to free 
online course materials from around the world. WLH has 83 
categories to browse, locating systems such as Find a Course 
and Advanced Search utilities. If you can not find what you 
are looking for on WLH, shoot an email or check out About 
WLH, which contains a Useful Links page with links to sites 
such as distance learning, degree programs and the Center 
for Instructional Technologies. All these services are free. 
The courses that are posted on WLH contains links to course 
materials for university-level courses. Some, though not all, 
of these courses are offered entirely over the Internet, while 
some offer college credits through distance learning. All are 


536 Manual of Digital Libraries 


courses offered at accredited colleges and universities around 
the world, and all course materials reachable through WLH 
are free and publicly available. 


Teaching Resources (http://socsci.colorado.edu/ 
POLSCI/RES/teaching.html#r) : A collection of teaching 
resources are available at this site. The site also includes 
syllabi and online courses and publishers. These websites 
allow you to search for possible textbooks to use in your 
courses. Some allow you to order an examination copy online. 


Socinfo, the CTI Centre (http://www.stir.ac.uk/ 
Departments/HumanSciences/ Socinfo/) : The Socinfo, CTI 
Centre for Sociology, Politics and Social Policy which is 
located at the University of Stirling, is one of twenty-four 
subject based Centres within the UK. Aiming to encourage 
the effective use of technology in order to improve the overall 
quality of teaching and learning in the higher education 
disciplines of sociology, politics and social policy, its site also 
facilitates the following: Soclnfo Guide to IT Resources; 
SocInfo Useful Resources; GATEWAYS; Mailing List 
Archives; Other European, US and overseas sources; 
General UK sources; Social Sciences list servers and 
newsgroups; Quantitative Data Analysis; Statistics Software 
News Groups; Qualitative Data Analysis; and Books. 


Study Web- Links For Learning (http://info.studyweb. 
com/) : Study Web has officially become part of The Lightspan 
Network and is now available only by school subscription. 
the school license includes families and children, so you will 
have access to the thousands of prescreened Links for 
Learning. 


VIRTUAL AND REMOTE EXPERIMENTAL 
LABORATORIES 


A shared virtual learning environment is a resource- 
based approach to the provision of learning materials. It 


Internet and Internet Resources : 537 


addresses some of the pressures on higher education caused 
by the changing student population. Students will be able to 
access on-line learning resources and keep in touch with 
coursemates from their home computers. 


Discursive learning labs are very good for social 
sciences, languages, and humanities subjects. In the 
languages they are an effective way for students to practice 
their writing, speaking and listening skills among a peer group. 
They enable remote students to take part in the group 
discussion and arguments that are central to the humanities 
and social sciences. In fact, the collaboration tools that 
underlie discursive learning labs provide an excellent way for 
remote students to keep in touch with coursemates. 
Discussion boards should be adopted as a central component 
of a shared virtual learning environment. However it may be 
noted that remote laboratories without manipulation are 
effectively just video conferencing suites. Example include: 


Web-Lab Information Page (http:// 
weblab.badm.sc.edu/web-lab-information/web-lab- 
information.htm) : Web-Lab is a Digital Library and Virtual 
Laboratory for Experimentation in the Social Sciences. 
Funded by the Digital Library Initiative at the National Science 
Foundation, it provides an on-line library of experimental 
software and data for research and teaching in Economics - 
and Sociology and it provides a network of collaborating 
laboratories for conducting experimental sessions. One needs 
to just click on the world map to connect to Web-Lab Col- 
laboratories around the world. 


The site also facilitates pointers to link with other 
collaborating experimental sites in North-America, Europe, 
and New Zealand. At the University of South Carolina there 
are three experimental laboratories: ExNet in Sociology, 
Clower-Lab in Economics and Beam-Lab in Business and 
Economics that can be accessed from this site. 


538 Manual of Digital Libraries 
DIRECTORIES 


Directories are the search services which involve 
human input in identifying relevant resources and allocating 
them to particular subject category or theme. Tools are usually 
searchable and browsable via the directory. Resources are 
not evaluated in terms of quality prior to their inclusion in the 
directory. Search services galore are available in Social 
Sciences via the Internet. All types of directories are available: 
be it specialists, e-mail directories, institution - or the like. 
Examples include: 


DARE: Directory in Social Sciences - Institutions, 
Specialists, Periodicals (http://www.unesco.org/general/ 
eng/infoserv/db/dare.shtml) : This covers about 11,000 
worldwide references to social science research and training 
institutions, specialists, documentation and information 
services, and social science periodicals; also has references 
to peace and human rights training and research institutions. 


DRIS - Directory of Research Information Systems, a 
Worldwide Survey of Research Information Systems (http:/ 
/www.niwi.knaw.nl/cgi-bin/nph-dris_search.pl) : NIWI, the 
Netherlands Institute for Scientific Information Services, is 
an institute of the Royal Netherlands Academy of Arts and 
Sciences (KNAW). Formally started on 1 September 1997, 
out of the merging of six existing institutes in the area of 
providing scientific information, NIWI aims at providing 
scientific information, in the scientific fields of social sciences, 
history and Dutch language and literature. 


The Worldwide Email Directory of Anthropologists 
(WEDA) (http:// wings. buffalo.edu/academic/department/ 
anthropology/WEDA/) : The Worldwide Email Directory of 
Anthropologists (WEDA) is a searchable database of address 
and research information about anthropologists from around 
the world. This is a completely volunteer project, established 
to encourage and aid scholarly communication. Here, 


Internet and Internet Resources 539 


anthropology is taken in its widest sense, to include physical, 
earth, and social scientists, as well as their colleagues in the 
humanities. 


ONLINE DOCUMENTS 


Traditional textbooks are not commonly made freely 
available over the Internet. However some academics have 
created their own online and these are often of a very high 
textbooks quality, with comprehensive coverage and excellent 
interactivity. Examples include: 


Research Methods Knowledge Base (http:// 
trochim.human.cornell.edu/kb/ index.htm) : The Research 
Methods Knowledge Base is a comprehensive web-based 
textbook that addresses all of the topics in a typical 
introductory undergraduate or graduate course in social 
research methods. It covers the entire research process 
including; formulating research questions; sampling — 
probability and non-probability; measurement — surveys, 
scaling, qualitative, unobtrusive; research design — 
experimental and quasi-experimental; data analysis; and, 
writing the research paper. It also addresses the major 
theoretical and philosophical underpinnings of research 
including— the idea of validity in research; reliability of 
measures; and ethics. The Knowledge Base was designed 
to be different from the many typical commercially-available 
research methods texts. It uses an informal, conversational 
style to engage both the newcomer and the more experienced 
student of research. A fully hyperlinked text that can be 
integrated easily into an existing course structure or used as 
a sourcebook for the experienced researcher who simply 
wants to browse. 


Mindweave: Communication, Computers and 
Distance Education (http://icdl.open.ac.uk/lit2k/ 
LitResult.ihtml?&id=162) : It is a collection of 31 papers, 28 


540 Manual of Digital Libraries 


of which were presented at the International Conference on 
Computer Mediated Communication in Distance Education 
at the United Kingdom Open University in October 1988. The 
CDML helps the navigator in three pronged direction: (1) Look 
at what other people are doing; (2) Identify techniques that fit 
your needs; (3) Identify resources inside and outside the 
university to implement those techniques. 


(B) By Category/Resource Types 


The resources categorized under this category are of 
utmost significance to social scientists as the resources under 
various resource types only address themselves to pooling 
of resources in various forms and types under one umbrella. 
Hence resources are subject- oriented and some sort of 
quality is also maintained while indexing such resources under 
the categories listed below — except search engines. 
Therefore starting from these resources while looking for 
information on any area of social science will indeed be fruitful. 


Following are the main categories with suitable examples for 
each case. 


e  Subject-based Gateway Services and Virtual Libraries 
e Resource Guides/Resource Catalogues 

e Subject Catalogues and Directories 

e Rating and Reviewing Services 

e Search Engines, Meta Search Engines and 

e Search Tools and Search Tips 


SUBJECT-BASED GATEWAY SERVICES AND VIRTUAL 
LIBRARIES 


Subject gateways are online services and sites that 
provide searchable and browsable catalogues of Internet 
based resources. Subject gateways will typically focus on a 
related set of academic subject areas. The simplest types of 
subject gateways are sets of Web pages containing lists of 


Internet and Internet Resources 541 


links to resources. Some gateways index their lists of links 
and provide a simple search facility. More advanced gateways 
offer a much enhanced service via a system consisting of a 
resource database and various indexes, which can be 
searched and/or browsed through a Web-based interface. 
Each entry in the database contains information about a 
network-based resource, such as a Web page, Web site, 
mailing list or document. Entries are usually created by a 
cataloguer manually identifying a suitable resource, 
describing the resource using a template, and submitting the 
template to the database for indexing. Subject gateways are 
also known as subject-based information gateways (SBIGs), 
subject-based gateways, subject index gateways, virtual 
libraries, clearing houses, subject trees, pathfinders and other 
variations thereof. 


There is a considerable number of Web-based 
gateways that can be used to locate network-based resources 
in a particular subject area. Nearly all of these gateways have 
unique features, additional subject-based services, and 
different approaches to how information about network-based 
resources is stored in the resource are also included. Example 
include: 


Social Science Information Gateway (http:// 
sosig.esrc.bris.ac.uk) : The Social Science Information 
Gateway (SOSIG) is a freely available Internet service which 
aims to provide a trusted source of selected, high quality 
Internet information for students, academics, researchers and 
practitioners in the social sciences, business and law. It is 
part of the UK Resource Discovery Network. SOSIG is a kind 
of one stop shop for navigators looking for social science 
resources. Its several features include: 


Its SOSIG Internet Catalogue is an online database of 
high quality Internet resources. It offers users the chance to 
read descriptions of resources available over the Internet and 


542 Manual of Digital Libraries 


to access those resources directly. The Catalogue points to 
thousands of resources, and each one has been selected 
and described by a librarian or academic. The catalogue is 
browsable or searchable by subject area. 


Its Social Science Search Engine is a database of over 
50,000 Social Science Web pages. Whereas the resources 
found in the SOSIG Internet Catalogue have been selected 
by subject experts, those in the Social Science Search Engine 
have been collected by software called a ‘harvester’. All the 
pages collected stem from the main Internet catalogue this 
provides the equivalent of a social science search engine. 


Its Social Science Grapevine is the ‘people oriented’ 
side of SOSIG, offering a unique online source of career 
development opportunities for social science researchers in 
all sectors. Grapevine carries details of relevant training and 
development opportunities from employers and training 
providers. Researchers can also make their CVs available 
online which are freely accessible to all visitors to the site. 
Grapevine’s Likeminds section provides a forum for exchange 
of ideas and information about potential research 
opportunities and partnerships. If you want to find contacts in 
your field you can also check the social science departmental 
database. 


My Account’refers to your personal account on SOSIG. 
Use your profile to create your own personal view of the Web, 
using the high quality SOSIG catalogue and the massive 
harvested “research-engine” database. You can find out which 
resources have been added recently in your area and register 
for regular email notification of important developments in 
your special area of interest. ‘My Account’ can also be used 
to post details of new social science conferences and courses 
to Grapevine. 


In nut shell, it is an excellent online catalogue of 
thousands of high-quality Internet resources relevant to social 


Internet and Internet Resources 543 


science education and research. Every resource has been 
selected and described by a librarian or subject specialist; 
search or browse the catalogue. Could be the first choice of 
navigator for finding social science resources. 


Argus Clearinghouse (http://www.clearinghouse.net): 
It is a central access point to more than 1,200 topic-specific 


guides to Internet resources, each of which is critically 
evaluated. 


The Socio Web (http://www.socioweb.com/~ markbi/ 
soeioweb/) : It lists sociological resources on the net under 
12 different categories within this subject area. It provides 
links to indexed resources. 


Social Sciences’ Virtual Library (http:// 
www.clas.ufl.edu/users/gthursby/ socsci/index.htm) : This 
document keeps track of online information as part of The 
World-Wide Web Virtual Library. Sites are inspected and 
evaluated for their adequacy as information sources before 
they are linked from here. A useful site for reaching important 
resource directories, guides, data archives, Social Sciences 
Scholarly Societies or professional associations and other 
similar resources in social sciences. 


Business Information Sources on the Internet: (http:/ 
/www.dis.strath.ac.uk/ business) : This is a good selective 
guide to some of the best Internet sites that contain business 
information - for example, company directories, business 
news, market information, statistical, economic and export 
data etc., as well as country information - with an emphasis 
on European, particularly UK resources. 


Biz/ed (http://www.bized.ac.uk/) : Biz/ed is a unique 
business and economics service for students, teachers and 
lecturers. The Biz/ed Internet Catalogue aims to provide a 
trusted source of selected, high quality internet information 
for students, researchers and practitioners in the areas of 


544 Manual of Digital Libraries 
business, management and economics. 


Geo-Information Gateway (http://www.geog.le.ac.uk/ 
cti/info.html) : It is an index of on-line Geo-Information 
resources for university staff and students in the spatial 
disciplines. Links are usually arranged under these standard 
headings— International and National Organizations; 
Research Centres and Projects; Data/Information Libraries; 
Educational Materials/Online Courses and Electronic 
Journals. 


RESOURCE GUIDES/RESOURCE CATALOGUES 


Resource Guides/ Resource Catalogues provide a 
comprehensive collection of resources preferably under 
various subject categories. Examples include: 


British Education Internet Resource Catalogue (hitp:/ 
/brs.leeds.ac.uk/_ ~beiwww/beirc.htm) or (http:// 
www. lisc.ac.uk/dner/development/projects/ briteduport/) : 
The British Education Internet Resource Catalogue is 
designed to aid the identification of useful internet resources 
by people with a professional or scholarly interest in education 
or training. The Catalogue provides descriptions and 
hyperlinks for evaluated internet resources within an indexed 
database. The collection aims to list and describe significant 
information resources and services specifically relevant to 
the study, practice and administration of education at a 
professional level. Its primary audience comprises of 
researchers, teachers and students in the field of education 
in the higher and further education sectors of the United 
Kingdom but its value stretches beyond those groups. 


The Catalogue is produced by a network of University 
based correspondents providing content to the project 
management office, the British Education Index within Leeds 
University Library. It is the first deliverable from the three- 
year British Education Portal project which began in October 


Internet and Internet Resources 545 


2000. Both the Catalogue and Portal are providing services 
operating within the Resource Discovery Network with project 
funding being provided by the Joint Information Systems 
Committee. The searchable Catalogue is freely available 
together with detailed information about its purpose and use. 


Resource Guide to the Social Sciences (http:// 
www.jisc.ac.uk/subject/socsci/) : The Resource Guide for 
the Social Sciences has been set up to raise awareness of 
the range of resources available and to offer a variety of 
activities to promote effective use of the resources. Each 
section of the guide includes a list of resources with access 
descriptions. Some resources are freely available and can 
be accessed immediately. Others are conditionally free and/ 
or require subscription and /or registration. Resources that 
are conditionally free or require subscription and/or 
registration may require a password that is issued by the 
subscribing institution. A range of resources has been set up 
specifically to meet the needs of those working and studying 
in the social sciences. The guide focuses on resources funded 
by the Joint Information Systems Committee (JISC) of the 
Higher and Further Education Funding Councils of the UK, 
and the Economic and Social Research Council (ESRC). 
Resources covered are— Bibliographic, reference and 
research information; Publications online; Subject gateways; 
Data services; Datasets; Spatial datasets; Data visualization; 
Software services and support for data processing; Images, 
moving pictures and sound; Learning and Teaching; and 
Support services. 


ASIL Guide to Electronic Resources for International 
Law (http:// www.asil.org/resource/home.htm) : The ASIL 
Guide to Electronic Resources for International Law has been 
published by the American Society of International Law since 
1997, and is continuously being updated and expanded. A 
user friendly feature of the guide is a Quick Links option which 


546 Manual of Digital Libraries 


accesses a list of all of the links for each section of the guide 
as well as the full text of each guide section. The guide offers 
links to various areas within the subject area of International 
Law. 


Yahoo-Social Sciences (http://d1.dir.dcx.yahoo.com/ 
social science/) : Yahoo has over 40 categories under its 
social science resource guide providing access to subjects 
ranging from Anthropology to Women studies. Besides other 
resources such as books, forums, bibliographies and the like 
are also included. 


BUBL (http://bubl.ac.uk/link/soc. html) : It is a catalogue 


of 12,000 selected Internet resources. It is an Internet-based 
information service for the UK higher education community. 
In Social Sciences alone it has 282 categories under which 
resources are listed. BUBL was the first national UK service 
to offer its users subject-based access to the Internet, through 
the BUBL subject tree initiative, which began in 1993. In the 
subject tree, resources are arranged together by subject 
area—all accountancy resources are located together, as are 
all geology, library and physics resources etc. The original 
gopher-based subject tree was soon supplemented by a Web- 
based one, and both have now been incorporated into BUBL 
LINK (Libraries of Networked Knowledge). LINK contains 
thousands of links to Internet resources and services, and 
covers all main subject areas. Resource descriptions are 
searchable and subjects can be browsed by alphabetical order 
or Dewey Decimal Classification. BUBL subscribes to 
numerous mailing lists which announce new resources and 
services on the Internet, giving their URL and a description 
of their content. Individuals also contact the service, sending 
information about a resource or service, along with its URL. 
BUBL staff evaluate these, and decide whether or not they 
are suitable for inclusion on the service. 


/nternet and Internet Resources 547 


Internet Crossroads in the Social Sciences (http:// 
dpls.dacc.wisc.edu/ newcrossroads/index.asp) : DPLS 
maintains the Internet Crossroads as an annotated list of data- 
related links to web sites useful to social science researchers. 


SUBJECT CATALOGUES AND DIRECTORIES 


Subject Catalogues and Directories are similar to 
resource guides or resource catalogues with a subtle 
difference in terminology. It may be indicated that the subject 
catalogues and directories do not discriminate between sites 
in terms of their quality. Those involved in developing and 
maintaining the tools are concerned with the subject relevance 
of the materials and not necessarily with their quality - this 
contrasts with virtual libraries and subject-based gateway 
services in particular, where humans are involved in 
identifying potentially relevant resources and also in 
evaluating their quality. Examples include: 


Yahoo (http://www.yahoo.com) : It was launched in 
1994 and itis an oldest and largest subject directory covering 
over 750,000 websites divided into 25,000 categories. 


Galaxy (http://galaxy.einet.net/’) : Initiated in 1993, it 
wentlive in 1994. This claims to be the oldest directory. Similar 


to Yahoo, being searchable and browsable. 


RATING AND REVIEWING SERVICES 


Internet is an ocean of resources with thousands of sites 
getting added every week to the existing pool of resources. 
So finding a good resource is sometimes a problem from such 
a colossal resource-base. By accessing rating and reviewing 
services sites, you could get a list of reviewed Internet sources 
in respective subjects. Examples of this type include: 


Jumpcity (http://www.jumpcity. com/search-page. 
html) : It reviews sites in all subjects. It provides keyword as 


well as subject search. 


548 Manual of Digital Libraries 


Lycos Top (http://search.lycos.com/) or (http:// 
point.lycos.com/) : It reviews sites in all subject areas. It 
provides link to (Attp-/www.alltheweb.com/ for further 
confirmation on quality of ranked sites. 


SEARCH ENGINES AND META SEARCH ENGINES 


The social sciences information resources available via 
Internet are colossal. There may be instances one does not 
know or is not aware of any of the above resources or sites. 
Where does one go? The answer is search engines. Search 
engines are one of the primary ways of finding resources on 
the Internet. Also called spiders or crawlers, there constantly 
visit web sites on the Internet so as to create catalogues of 
web pages. However, it may be indicated that searching with 
search engines at times returns a huge number of hits, which 
is not feasible to go through. Examples of search Engines 
include: 


Altavista (http://altavista.digital.com) : It is one of the 
very best and most efficient search engines, which also 
translates the text of Web sites for you, in six languages. But 
do click on their ‘Help for simple query’ before you start your 
search and print out some of the information. Also offers a 
range of helpful ‘Refine search options’. 


BullsEye (http://www. intelliseek.com/prod/ 
bullseye.htm) : It is an impressive free program for Windows 


only that enables you to query a large number of search 
engines simultaneously by searching over 700 of the major 
search engines and major information sources on the Web. 
It highlights your search terms in the matching Web pages. 
You can pre-select your searches under thirteen broad 
categories — including book searches, and online catalogue 
searches of the Library of Congress database, and you can 
chose three star-levels of precision. If you chose the third 
option, it will download all the pages and analyze the results, 


Internet and Internet Resources 549 


and BullsEye’s integrated search engines enable you to 
continue your search locally within the pages found, without 
having to be connected to the Internet. 


Copernic (http://www.copernic.com) : This is another 
sophisticated Internet search application program that can 
be downloaded for free. It can query several search engines, 
directories, Usenet archives and e-mail address databases 
at once, providing access to some 30 major search engines 
and other information sources. It displays the results on a 
single page in your browser, stores them on your disk, 
organizes them in a clear manner by relevance and score, 
and you can later retrieve them for offline browsing. 


Debriefing (http://www.debriefing.com) : It is a good 
metasearch engine that finds the first 50 or so best matches 
to a search query from Yahoo, Altavista, Excite and Hotbot 
very quickly. 


Dogpile (http://www.dogpile.com) : This is also a 
metasearch engine that searches several search engines at 


once and then ‘fetches’ what it has found. It allows you to 
customize your search by choosing which engine should be 
searched first, second, third, etc., among a list of 25. It is 
quite good for single-word searches. 


Google (http://www.google.com) : Google is superior 
and innovative search engine that uses the number of links 
to a site to rate its importance. And ‘I’m feeling lucky’ button 
automatically takes you to the first Web page - i.e. the best 
result/match - returned for a query. Click on the bar graph at 
the beginning of the result to see which pages link to the 
particular page. Definitely one of the coolest search engines 
which can strongly be recommended. 


Hotbot (http://www.hotbot.com) : It is an easy to use, 


fast and powerful, and one of the most consistently top-rated 
search engines. It offers a single form — in a pull-down menu, 


550 Manual of Digital Libraries 


to search for a phrase, person, or any combination of search 
terms to fine-tune your output preferences. Good for ‘exact 
phrase’ searches, and also lets you search for sites with audio 
or video features. 


Magellan (http:/Awww.magellan.excite.com) : It is a 
good online guide to the Web, together with reviews by broad 


subject areas; with ‘Search Voyeur’, an intriguing real-time 
site that allows you to ‘spy’ on searchers. 


Metacrawler (http://www.metacrawler.com) : It is a 
multiple-search resource which takes the best results from 


other search engines and directories and returns up to 30 
hits from each site. 


SEARCH TOOS AND SEARCH TIPS 
There are available search tools and search tips on net. 
Example of Search tools and search tips include: 


Searchability: Guides to Specialized Search Engines 
(http://www.searchability.com) : It is a good descriptive guide 
to specialized search engines, and to search directories of 
search engines, evaluating their subject coverage and 
effectiveness. 


Search IQ (http://www.searchig.com) : It provides 
access to a very large number of search engines and 


directories, and combines this with independent search- 
engine reviews and rankings, to help you identify the right 
search tool for the job - and those with the highest IQ! Also 
offers tips to improve your searching techniques, with tutorials 
and guides. 


Search Tools: A Guide (http://www.mmu.ac.uk/h-ss/ 
dic/main/search.htm) : It is an excellent guide to, and 
evaluations of, the major European Internet search tools, 
metasearch engines, e-mail address finders, and various 


Internet and Internet Resources 551 


special search tools. It is maintained by Richard Eskins at 
the Department of Information & Communications at 
Manchester Metropolitan University. 


The Spider’s Apprentice (http://;www.monash.com/ 
spidap.html) : How to get the most from search engines, tips 
for search strategies, analysis of search engines, frequently 
asked questions, and more are available on this site this is a 
useful guide for beginners. 

ZD-Net/PC Magazine Online: Your Complete Guide 


to Searching the Net (http://www.zdnet.com/pcmag/ 
features/websearch/ intro.htm) : It reviews and assesses the 


search engines, indexes and directories. 


(C) By Organization 


Under this category, resources have been graped on 
the basis of type of organization in the area of Social Sciences 
providing access to their resources via the Internet. These 
could be a collection of WWW pages that are created and 
maintained by a particular organization and/or various 
services and products they offer. Organizational sites always 
include subject-based pages and personal home pages. 
These resources could be from: 


e Scholarly Societies for the Social Sciences 

e International Organizations 

e Other Important International Centres and Associations 
e Individual Organizations 

e Networking Organizations 


e University Departments 


SCHOLARLY SOCIETIES FOR THE SOCIAL SCIENCES 


These comprise of Councils, Federations, and Online 
Directories. Example include. 


552 Manual of Digital Libraries 


American Council of Learned Societies (http:// 
www.acls.org/jshome.htm) : It is a major coalition of scholarly 
organizations in the United States. The web site offers a linked 
list of constituent learned societies and to other online 
scholarly resources. 


Associations on the Net: Social Sciences (http:// 
www. ipl.org/cgi-bin/ref/ aon.out.p!l?id=soc0000) : It is a part 
of the guide provided by the Internet Public Library. A useful 
source to find out social science associations that exist on 
the Internet. 


Scholarly Societies Project (http:// 
www.lib.uwaterloo.ca/society/ subjects _soc.html) : This is 
a set of links to a large number of professional organizations, 
arranged by discipline or subject-matter field. It is a service 
of the Scholarly Societies Project at the University of Waterloo 
(Canada). 


Social Science Research Council (http:// 
www.ssrc.org/index.htm) : The SSRC is an independent, 
non-governmental, not-for-profit, international association 
devoted to the advancement of interdisciplinary research in 
the social sciences. It sponsors interdisciplinary workshops 
and conferences, fellowships and grants, summer training 
institutes, scholarly exchanges, and publications. There is a 
useful hyperlink index of funders, affiliated institutes, and other 
organizations. 


Social Sciences and Humanities Research Council of 


Canada (http:// www.sshrc.ca/) : It reports on grant support, 
conferences, projects, and current research. 


INTERNATIONAL ORGANIZATIONS 


Important international organizations engaged in social 
science field are listed here. 


Internet and Internet Resources 553 


United Nations Economic and Social Council (http:// 
www.un.org/Overview/Organs/ecosoc.html) : The Economic 
and Social Council was established by the Charter as the 
principal organ, under the authority of the General Assembly, 
to promote higher standards of living, full employment, and 
conditions of economic and social progress and development 
worldwide. This site contains information on current UN 
development programmes as well as a selection of surveys 
of economic and welfare conditions throughout the world. Full- 
texts of all UN’s sessional documents from 1994 onwards 
are also available. 


Peace and Conflict -The Home of Peace Studies on 
the World Wide Web (http://csf.colorado.edu/peace/) : Put 
up by the Peace studies Association (PSA), this site is a useful 
resource for calendar of peace studies conferences and 
events; courses and syllabi of peace research centres, 
institutes and organizations; peace studies discussion group 
and much more in the area. 


Other Important International Centres and 
Associations ICPSR (http://www.icpsr.umich.edu/) : The 
Inter-university Consortium for Political and Social Research 
(ICPSR), established in 1962, is an integral part of the 
infrastructure of social science research. ICPSR maintains 
and provides access to a vast archive of social science data 
for research and instruction, and offers training in quantitative 
methods to facilitate effective data use. 


IASSIST (http://dataJib.libxary.ualberta.ca/iassist/) : 
International Association for Social Science Information 


Services and Technology or IASSIST is an organization 
dedicated to the issues and concerns of data librarians, data 
archivists, data producers, and data users. This unique 
professional association assists members in their support of 
social science research. 


554 Manual of Digital Libraries 


The Center for Basic Research in the Social Sciences 
(http:// www.cbrss.harvard.edu/index.htm) : The Center for 
Basic Research in the Social Sciences (CBRSS) was founded 
at Harvard University in June 1998 under the auspices of the 
Faculty of Arts and Sciences (FAS). The mission of Harvard 
University’s Center for Basic Research in the Social Sciences 
is to foster and improve basic social scientific research. It 
also supports seminars, workshops, and conferences, as well 
as a variety of teaching and training activities, including 
financial support for graduate dissertation research, for other 
Harvard research opportunities, and for implementation of 
innovative courses, besides other services. The site proves 
a good resource for those interested in basic research in the 
social sciences. 


INDIVIDUAL ORGANIZATIONS 
Some important individual efforts are listed here : 


Cheiron: The International Society for the History of 
Behavioural and Social Sciences (Canada) (http:// 
www.yorku.ca/dept/psych/orgs/cheiron/cheiron.htm) : It is 
an association mainly of academic professionals, with 
headquarters at York University in Canada. 


International Statistical Institute (Netherlands) (http:/ 
Iwww.cbs.ni/isi/) : Established in 1885, the Institute is an 
autonomous society which seeks to develop and improve 
statistical methods and their application through the promotion 
of international activity and co-operation. 


NETWORKING ORGANIZATIONS 


Social Science Research Network (SSRN) (http:// 
www.ssrn.com/ index.html) is devoted to the rapid worldwide 
dissemination of social science research and is composed of 
a number of specialized research networks in each of the 
social sciences. These networks are: Accounting Research 


Internet and Internet Resources 555 


Network (http://www.ssrn.com/arn/index.html); Economics 
Research Network (http://www.ssrn.com/ern/index.html); 
Financial Economics Network (http://www.ssrn.com/fen/ 
index.html); Legal Scholarship Network (http:// 
www.ssrn.com/Isn/index.html); Management Research 
Network (http://www.ssrn.com/mrn/index.html); Latin 
American Network (http://www.ssrn.com/lan/index.html). 


UNIVERSITY DEPARTMENTS 


These also provide very useful resources in the given 
subject area and have already been discussed under CWIS 
in this chapter. 


HOW-TO KEEP UP-TO-DATE WITH NEW SOURCES 


The Internet can be useful for maintaining current 
awareness within a particular subject field. It is important to 
keep a track of new sites that are added in the concerned 
area of specialization. The following are some such sites that 
can help you in keeping up-to-date with new resources. 


Scout Report For Social Science (http:// 
www.scout.cs.wisc.edu/) : The Scout Report is one of the 
Internet’s longest-running weekly publications, offering a 
selection of new and newly discovered online resources of 
interest to researchers, educators, and anyone else with an 
interest in high-quality online material. This is a very useful 
regularly updated source of information for new resources 
covering all types of resources like full-text papers; table of 
contents for new journals; forthcoming conferences; statistics 
and much more. The Report is available both on the web 
site, and in e-mail form via mailing list subscriptions. Past 
issues of the Scout Report, as well as past issues of the 
discontinued subject-specific Reports, are available from this 
site as back issues pages. 

WWW Social Sciences Newsletter (http:// 


www.clas.ufl.edu/users/gthursby/ socsci/news.htm) : This 


556 Manual of Digital Libraries 


online newsletter is a part of the Social Sciences WWW Virtual 
Library and is provided as a service to the World-Wide Web 
community. To receive the listings of new or improved WWW 
sites in the form of e-mail, you may subscribe to such a service 
by sending an e-mail message to majordomo@clas.ufl.edu 
with the following information in the body of the message 
subscribe socsci-news “Firstname Lastname” e-mail address. 


Yahoo- Social Sciences Net Events (http:// 
search.yahoo.com/bin/ search?p=Social+ Sciences+Net+ 
Events) : It provides information on upcoming events, local 
chapters, research and jobs. personal accounts and daily 
discussion of world events, trivial happenings, etc. 


Internet Resources Newsletter (http://www.hw.ac.uk/ 
libWWW/iirn/ irn.html) : This is a free monthly newsletter for 
academics, students, engineers, scientists and social 
scientists. Possibly the most useful round-up of new resources 
for academic users. 


Netsurfer Digest (http://www.netsurf.com/nsd/) : 
Netsurfer is now charging money for full access to their 


content. However some stuff is still free. For subscription to 
such a service full details about the new subscription system 
can be found at: SUBSCRIPTION INFO: http:// 
www.netsurf.com/subs letter.html 


BUBL LINK Updates (http://bubl.ac.uk/link/updates/ 
current.html) : It is a selected Internet resources covering all 
academic subject areas. One could select appropriate subject 
area such as social sciences and find out new resources. 


5.9.3. Internet Resources in Science and Technology 


The Science and Technology (S&T) itself covers vast 
subject areas in its ambit. Broadly there are more than 18 
subject areas as per Colon Classification and 191 subdivisions 
as per DDC 21* Edition, within the broad subject areas. 


Internet and Internet Resources 557 


There are many types of information resources 
available, perhaps often more, than we know what to do with 
them. The Internet has brought qualitative improvements in 
scholarly communication by facilitating global connectivity of 
computers and also development of various search tools and 
techniques for accessing networked information. Internet has 
an estimated 540 million global online population for whom 
there may be at least something available of use would mean 
the colossal quantum of information resources available on 
the Internet. The resources available via the Internet are 
constantly increasing and changing also, so any list is liable 
to get outdated. The resources have been broadly categorized 
type-wise for better understanding. However since various 
services and resources provide for various facilities that are 
common, hence overlap of resources under various types is 
inevitable. However it may be mentioned that only selected 
resources are given as examples for reference purposes, it 
is neither feasible nor possible to include a compressive list. 
The three groups under which the resources have been 
categorized in order to make it comprehensible are: 


° By Source Type 
° By Category 


e By Organization 


(A) By Source Type 


When the Web first emerged, Web sites consisted of 
relatively small collections of static and text-only data. The 
complexity of today’s Web sites reflects powerful new tools 
for developing and managing them. Scripting languages 
enable Web developers to manage large amounts of data 
and create customizable interfaces that respond to user 
needs, while information architecture techniques ensure that 
Web users quickly and efficiently locate information. With such 
developments taking place and the ease with which all types 


558 Manual of Digital Libraries 


of sources get created and mounted on the Net, it is becoming 
unimaginable whether there are any type of sources that have 
been left out. 


There are several types of sources available in Science 
& Technology, such as: 


e Electronic Journals and Newsletters 

> Table of Contents 

e Preprints 

e Discussion Lists or Forums/Usenet Newsgroups 
e Technical Reports 

e Software Archives 

e Data Archives 

e Library Catalogues 

e Subject Databases 

e Campus Wide Information Systems (CWIS) 
e Patents 

e Document Delivery 

e Reference Sources 

e Courseware Directories 

e Others 


ELECTRONIC JOURNALS AND NEWSLETTERS 


Electronic journal is a journal which is produced in an 
electronic format; sometimes the electronic equivalent of a 
paper-based journal. Although now an increasing number of 
journals are produced entirely in electronic format. Such 
resources facilitate quick and easy access to current as well 
as back volumes. Besides one does not need to go on 
browsing through all the articles, if you wish you could pick 


Internet and Internet Resources 559 


only papers of interest.. Most of these journals are fu//-text 
journals. Examples include: 


Elsevier Science (http://www.elsevier.nl/) : Elsevier 
Science has become the undisputed market leader in the 
publication and dissemination of literature covering the broad 
spectrum of scientific endeavours. Elsevier Science plays 
important role in advancing the technologies necessary to 
create a seamless electronic information delivery environment 
The access to full-text data is fee-based. It provides access 
to over 1500 scientific, technical and medical peer-reviewed 
journals; search to over 40 million abstracts from scientific 
articles; and link out to articles from over 120 other publishers. 
In 2002, ScienceDirect has launched a new commercial 
offering to academic libraries, adding a new license for users 
to gain electronic access to both Elsevier Science and 
Academic Press journals on ScienceDirect. ScienceDirect E- 
Choice enables access via a single convenient platform, and 
a single license agreement. Flexible access for ScienceDirect 
guest users is also offered. Guest users of ScienceDirect - 
that is users not associated with a ScienceDirect account - 
are permitted to browse and read abstracts from all of the 
ScienceDirect journals for free. They can also set up, free, 
table of contents, email alerts, and create personal journals. 

ISI Web of Science (http://wos.mimas.ac.uk/) : ISI 
Web of Science Service is a massive resource base covering 
all the three main divisions of human knowledge, i.e., Science 
and Technology, Social Sciences and Humanities. The basic 
databases covered under Web of Science and made available 
are: 

— Science Citation Index Expanded with Cited References 
and Author Abstracts (1981-) 


— Social Sciences Citation Index Expanded with Cited 
References and Author Abstracts (1981-) 


560 Manual of Digital Libraries 


— Arts and Humanities Citation Index with Cited. 
References (1981 -) 


Springer Science Online (http://www.springer.de/) : 
Springer publishes annually over 2,400 new books and 
approximately 500 journals, most of which are available in 
electronic form. A total of about 20,000 books are currently 
available, 60 percent of them in English. 


Blackwell Scientific Journals (http: // 
www.blacksci.co.uk/uk/journals.htm) : It covers over 250 
journals from its offices in Oxford, Boston, Melbourne and 
Berlin that are available-online. 


Academic Press Journals (http:// 
www.idealibrary.com) : Now part of E/sevier Science, IDEAL 
is the online library for over 10 million authorized users in 
academic, industrial and medical research. Following the 
acquisition of Harcourt by Reed-Elsevier, the Academic Press 
journal collection is being integrated into ScienceDirect, the 
web database for scientific, technical and medical research. 
Throughout 2002, users may continue to access electronic 
journals from Academic Press, W.B.Saunders. Churchill 
Livingstone, Bailliere Tindalland Mosbyon the IDEAL platform 
during the integration. 


J-Gate (http://j-gate.informindia.co.in) : J-Gate - the 


Gateway to a new world of journais is a premier Indian portal 
for e-journals conceived, developed and delivered by 
Informatics, India. It is a family of products and a kind of single 
source for librarians and information users to access, share 
and manage their e-journals effectively. Among the high 
points of J-Gate are a directory of about 10,000 e-journals 
with links to journal and publisher sites; a Table of Contents 
for several of these journals; and a searchable database with 
links to full-text and to the Union Catalogue of journals in 
leading national libraries, and to support resource sharing. It 
provides other facilities like - J-Gate Custom Content (JCC) - 


Internet and Internet Resources 561 


an exclusive local solution for management of the subscribed 
journals in the library an an individual institution or in an 
consortium mode. The other service being provided by JCCC 
is a technology platform for sharing journal resources. 


TABLE OF CONTENTS 


Table of Contents (TOC) of practically all e-journals are 
made available by their respective publishers generally for 
free. There is huge store of such information available on the 
Net. This is a very useful resource for librarians who wish to 
make provision for a Current Awareness Service. One needs 
to locate such resources on Net, bookmark them and provide 
value added service to the users. Examples include: ‘Contents 
Direct’ Service by Elsevier (http:// www.elsevier.nl/). The 
service covers over 800 journals. Other is ‘Uncover’ service 
provided by CARL agency having TOC of over 16,000 journals 
and ISI’s TOC Alerting Service (journal Tracker) that is 
available on http://www. isinet.com/jtrack. 


PREPRINTS 


The term “preprint” most often refers to a manuscripts 
that have not yet been published, but may have been reviewed 
and accepted; submitted for publication; or intended for 
publication and being circulated for comment. A preprint 
accessible over the Web may also be referred to as an “e- 
print.” Many e-prints are electronic versions of research 
papers that have been submitted for dissemination and review 
among peers; for publication in journals; or prior to 
presentation at conferences. Preprints also cover papers that 
authors have submitted for journal publication, but for which 
no publication decision has been reached, or even papers 
electronically posted for peer consideration and comment 
before submission for publication. In fact, preprints can also 
be documents that have not been submitted to any journal. 
Some preprint servers may define preprints as any electronic 


562 Manual of Digital Libraries 


work circulated by the author outside of the traditional 
publishing environment. 


The following is a list of Web resources that offer 
preprint searches: 


PrePrint Network (http://www.osti.gov/preprints/ 
ppnabout.html) : Itis a searchable gateway to preprint servers 
that deal with scientific and technical disciplines of concern 
to DOE. The PrePRINT Network provides access to electronic 
preprints available from diverse sites. Developed by the U.S. 
Department of Energy (DOE) Office of Scientific and 
Technical Information (OSTI), the Network is a “one-stop 
shopping” site for preprints in science and technology. The 

PrePRINT Alerts feature allows users to create personal 
profiles which will then notify the user as new information is 
added. Preprints in the areas of physics, materials, chemistry, 
mathematics, biology, environmental sciences and other 
areas related to DOE’s research interests are accessible 
through the Network. In addition, the PrePRINT Alerts 
capability allows users to set up files matching their specific 
interests. As new information is added to preprint servers, 
users are notified automatically, thus eliminating the need to 
manually check for recent additions. The PrePRINT Network 
offers the user several access mechanisms. Users may 
browse or search one specific preprint site, a selected set of 
site, or all of the listed sites. Its Browse option allows users to 
view an alphabetical listing of all of the preprint sites included 
in the system and to visit any of the individual sites listed. 
Within the Search section, users may choose to perform an 
indexed search of the HTML pages of the available sites which 
are continually updated. This option returns hits for any pages 
and for linked pages that contain the specified search term, 
including some items that may not be actual preprints. A 
second option for searching within the PrePRINT Network 
allows users to pulse the search engines of selected preprints 
sites with a single query. This search capability then returns 


Internet and Internet Resources 563 


a compiled results list. Its Subject Pathways option offers users 
the ability to browse the preprints resources by subject area. 
This section includes both preprints servers and preprints 
posted by individual scientists on their own sites. 


e-MATH — Directory of Mathematics Preprint and e- 
Print Servers (http://www.ams.org/global-preprints/) : The 
mission of the Directory of Preprint and e-Print Servers from 
American Mathematical Society (AMS) is to make available 
to the mathematical community the current homepage URLs 
and email contacts of all mathematical preprint and e-print 
servers throughout the world. This directory provides 
mathematicians with a tool to find any of these servers in 
order to browse the articles posted on them and, in many 
cases, to post an article to the server itself. The servers are 
divided into three categories— umbrella servers which cover 
all areas of mathematics such as the Front for the 
Mathematics ArXiv and the MPRESS/MathNet.preprints 
server, special subject servers and servers administered by 
mathematics departments and institutes. There is an 
additional link to retired preprint services. Although the AMS 
uses automated procedures to check the currency of the 
server URLs, it appreciates being notified 
(webmaster@ams.org) of any URL or e-mail contact changes, 
new preprint or e-print servers, consolidation of servers, etc. 
Some such servers classified by AMS are: Umbrella Servers; 
Special Subject Servers; Institute and Department Servers; 
and Retired Preprint Services. 

Other Preprint sites are given below for further use of 
the interneting users. 

e American Physical Society Preprint Server (closed to 
new submissions as of May 31,2000) at http:// 


publish.aps.org/eprint/. 
e Astronomy & Astrophysics Preprints & Abstracts at 


http://Awww.wbhead.com/WWWVL/Astronmy/astroweb/ 
yp_preprint.html. 


ny 


Manual of Digital Libraries 


Automated Mathematics e-Print Archives at http:// 
www.msri.org/publications/preprints/. 

Cellular Automata and Lattice Gases e-Prints (LANL) 
at http://xxx.lanl.gov/archive/comp-gas. 

CERN Preprint Server at http://weblib.cern.ch//Home/ 
Library Catalogue/Articles and Preprints/ Preprints/ 
index.php. 

Chaotic Dynamics e-Prints (LANL) at http://xxx.lanl.gov/ 
archive/chao-dyn. 

Chemical Physics Preprint Database - Brown University 
at http://www.chem.brown.edu/chem-ph.html. 
Chemistry Preprint Server (at ChemWeb) at http:// 
www.chemweb.com/preprint. 

Clinmed Netprints at http://clinmed.netprints.org/. 
Computer Science e-Prints (LANL) at http:// 
xxx.lanl.gov/archive/cmp-lg. 

Cogprints at http://cogprints.soton.ac.uk. 

CoRR - Computing Research Repository at http:// 
www.acm.org/pubs/corr/. 

Directory of Mathematics Preprint and e-Print Server 
(American Mathematical Society) at http:// 
www.ams.org/global-preprints. 

Enviro-Science e-Print Service at  http:// 
Exactly Solvable and Integrable Systems e-Prints 
(LANL) at http:/ox. lanl.gov/archive/solv-int. 

Institute for Mathematical Sciences Preprint Server at 
www.math.sunysb.edu/preprints.html. 

Mathematical Physics Preprint Server http:// 
rene.ma.utexas.edu/mp_arc/index.html. 

Mathematics e-Prints (LANL) at http://xxx.lanl.gov/ 
archive/math. 


Internet and Internet Resources 565 


° Mathematics Preprint Server at www. 
mathpreprints.com/math/Preprints/show/. 

° NCSTRL (Networked Computer Science Technical 
Reference Library) at www.ncstrl.org. 

° Nonlinear Sciences e-Prints (LANL) at http:// 
xxx.lanl.gov/archive/nlin-sys. 

e Nuclear Theory e-Prints (LANL) at http://xxx.lanl.gov/ 
archive/nuch-th. 

e Pattern Formation and Solutions e-Prints (LANL) at 
http://xxx.lanl.gov/archive/patt-sol. 

e Physics e-Prints (LANL). at http://xxx.lanl.gov/archive/ 
physics. 

e PrePrint Network - Searchable gateway to preprint 
servers that deal with scientific and technical disciplines 
of concern to DOE at http://www.osti.gov/preprint/. 

° PubMed Central at http://www.pubmecentral.nih.gov/. 

e Quantum Physics e-Prints (LANL) at http://xxx.lanl.gov/ 
archive/quant-ph. 

e Theoretical Ecology Preprint Server at http:// 
www.nceas.ucsb.edu:8504/esa/ppr/ppr.Query. 


DISCUSSION LISTS OR FORUMS/USENET 
NEWSGROUPS 


Discussion lists which are also sometimes called mailing 
lists or listservs, are the e-mail based lists available to a group 
of users who are interested in a particular topic in a specified 
subject area. Software is used to enable e-mail users to 
subscribe to a list, who can then post messages to whole 
group, participate in discussions, receive all the messages 
which are posted. Joining the forum is called subscribing while 
leaving the forum is called signing off. A major network 
resource that serves the purpose of current awareness, 
Usenet newsgroups are a world wide distributed system of 


566 Manual of Digital Libraries 


bulletin boards which are arranged hierarchically into topic 
areas. 


These are similar to discussion lists in that different 
users can discuss a particular area of interest, but users do 
not have to subscribe, and anyone can view the messages, 
provided they have access to the software required. Usenet 
newsgroups and discussion lists are differentiated by their 
means of accessing the information. However there are 
similarities such as the ways in which the newsgroups and 
discussion lists are commonly used. There are three such 
ways- (i) users may wish to post a query or a reply, (ii) they 
may lurk in a newsgroup or list, that is reading the messages 
and follow the discussion but without posting a message; (iii) 
or they may want to browse an earlier discussion using an 
archive. Discussion lists or Usenet Newsgroups provide an 
important platform to keep up-to-date with current 
developments; seek solutions to the problems users pose; 
and know about new net resources. Of course there are some 
drawbacks also- such as receiving irrelevant mails or what 
you call Junk mails, etc. There are thousands of such lists 
and newsgroups existing on the Net. An example include: 


Gentalk - Subscription to : listserv@usa.net, which 
provides a forum for discussion of genetic problems, lab 
protocols, current issues dealing on genetics and genetic 
engineering in general. 


SOFTWARE ARCHIVES 


There are thousands of software packages both 
shareware — which allows free trial use; and freeware for all 
purposes and all makes of computer via the Internet. Software 
archives are held at a number sites on the Internet. Examples 
include: 


HENSA-The Higher Education National Software 
Archives: (http://www.hensa.ac.uk/) : This software is mostly 


Internet and Internet Resources 567 


in public domain and shareware covering a wide range of 
applications but especially networking. The archive of 
microcomputer software is at Lancaster, and the Unix archive 
at Kent. 


The Archive, SunSITE Northeren Europe FTP 
Archive: (http://src.doc.ic.ac.uk/) or (ftp://src.doc.ic.ac.uk/): 
It is a massive archive of software, USENET newgroups and 
FAQs, e-journals, etc. 


DATA ARCHIVES 


A data archive is a permanent, electronic collection of 
datasets with accompanying metadata such that users of the 
data can acquire, understand and use the data. Data archives 
are resource centres for analysts who use data for research 
and teaching. Data archiving is a method of conserving very 
expensive resources and ensuring that their research 
potential is fully exploited. Archives ensure that when 
technology changes, the data in their holdings are technically 
transformed to remain readable in the new environment. Their 
functions usually include— being more than a long-term 
backup; and being more than an index or catalogue with 
pointers to datasets stored elsewhere ensuring that data are 
preserved against technological obsolescence and physical 
damage cataloguing their technical and substantive properties 
for information and retrieval supplying them in an appropriate 
form to secondary users. Data archives have been 
established in most European countries and in the United 
States. They are actively used for testing hypotheses and for 
other scholarly purposes. These are more useful in the areas 
of social sciences and humanities. For example: 


National Space Science Data Center at http:// 
nssdc.gsfc.nasa.gov/ is an archives that provides access to 
a wide variety of astrophysics, space physics, solar physics, 
lunar and planetary data from NASA space flight missions, in 


568 Manual of Digital Libraries 


addition to selected other data and some models and 
software. NSSDC provides online information bases about 
NASA and non-NASA data as well as the spacecraft and 
experiments that have or will provide public access data. 
NSSDC also provides information and support relative to data 
management standards and technologies and much more. 


SUBJECT DATABASES 


Databases are the collection of records each of which 
contains details of a different data item, whether numeric, 
textual or image-based, and which are usually available in a 
searchable format. There are wide range of such sources 
available both fee and non-fee based - such as library 
catalogues, commercial catalogues and bibliographical 
databases. Examples include: 


PubMed at http://www.ncbi.nlm.nih.gov/PubMed/, is 
one of the several versions of MEDLINE made available via 


the Internet. The database provides access to millions of 
citations held in original MEDLINE database plus pre- 
MEDLINE — basic data which has not yet been added to 
MEDLINE. Besides, the site also provides access to other 
related databases. 


TECHNICAL REPORTS 


Technical reports are the reports which provide 
technical data collated by a committee which contain material 
not considered suitable or appropriate for a standard. Such 
reports have manifold benefits particularly in the area of S&T. 
These — 


— Provide additional material which can be referred to, 
but is not likely to be found in a standard; 


— Provide details of interim/progress reports or completed 
R&D projects; 


Internet and Internet Resources 569 


— Provide more details than papers in journals or 
conferences; 


— Are intended as an informative publication only; 


— Are recognized internationally and nationally by being 
published by the national standards body, and by being 
the national equivalent to ISO and IEC Technical 
reports; and 


— Serve as a valuable tool for scientific communication. 


With the help of Internet, such resources can be more 
easily accessed now. Examples include: 


Langley Technical Reports Server (LTRS) (http:// 
techreports.larc.nasa.gov/Itrs/) : LTRS is a service which 
allows users to search available online NASA published 
documents, including Meeting Presentations, Journal Articles, 
Conference Proceedings, and Technical Reports. Many 
documents are available in compressed PostScript and PDF 
formats. All documents are unclassified and publicly available. 


NASA Technical Report Server (http:// 
techreports.larc.nasa.gov/cgi-bin/NTRS) : NASA Technical 
Report Server (NTRS) is an experimental service that allows 
users to search the many different abstract and technical 
report servers maintained by various NASA centers and 
programs. Specifically, it is a unified interface to many 
separate WAIS servers. NTRS is both a superset of the of 
the various servers, and a canonical listing of the servers. 
NTRS is intended for use by the various research communities 
targeted by the various report and abstract servers. It is open 
to all members of the Internet/World Wide Web community. 


Networked Computer Science Technical Library (http:/ 
/ www.ncstrl.org/) : NCSTRL, the Networked Computer 
Science Technical Reference Library, is a federation of digital 
libraries providing computer science materials. The 
architecture of the original NCSTRL was based largely on 


570 Manual of Digital Libraries 


the Dienst software. This is a very well developed system for 
world wide access to computer science technical reports. 


LIBRARY CATALOGUES 


The Library Catalogue lists all of the material available 
in the library with information on where to find it, whether it is 
available and how long one may borrow it for. Reservations 
can be placed on material that is currently on loan to another 
person. The following materials on the Library Catalogue can 
be found: Books, Periodicals — e.g. journals, magazines, 
newspapers, reports and official publications, Pamphlets and 
Official Documents, Theses, Audiovisual material, Microfiche 
and microforms and Electronic resources 


The Catalogue also allows you to search for information 
regarding your own use of the library as a registered user. 
Information such as — what loans you have out and when 
they are due back, what reservations you have outstanding 
or the fines you currently have on your borrower card. You 
can renew books you have on loan, provided they are not 
overdue or reserved by another borrower; you can find out if 
books you have reserved have arrived; and also you can 
check to see whether your interlibrary loans have arrived. A 
large number of catalogues can be accessed online via 
Internet. Examples include: 


The Library of Congress Online Catalogue (http:// 
catalog.loc.gov/) : The Library of Congress Online Catalogue 
is a database of records representing the vast collection of 
materials held by the Library. In addition to these records, 
the Online Catalogue provides cross-references, notes, and 
circulation status, as well as information about Library 
materials still at the acquisitions stage. 


The Library of Congress Online Catalogue contains 
approximately millions of records representing books, serials, 
computer files, manuscripts, cartographic materials, music, 


Internet and Internet Resources 571 


sound recordings, and visual materials. The Catalogue also 
displays searching aids for users, such as cross-references 
and scope notes. The catalogue records reside in a single 
integrated database; they are not separated according to type 
of material, language of material, date of cataloguing, or 
processing/circulation status. As an integrated database, the 
Online Catalog includes 3.2 million catalogue records from 
an earlier database. These catalogue records, primarily for 
books and serials catalogued between 1898 and 1980, are 
being edited to comply with current cataloguing standards 
and to reflect contemporary language and usage. 


Questia (http://www.questia.com/) : It is believed to 
be one of the World’s Largest Online Library. The documents 
are arranged on a subject-wise basis. 


CAMPUS WIDE INFORMATION SYSTEMS (CWIS) 


Campus-Wide Information Systems is an information 
system intended to present an integrated view of the institution 
to the members of its community, as well as to alumnae, 
prospective students and others with an interest. Generally 
within the menu hierarchy, a broad array of local and Internet 
resources are made available.Generally the information that 
is included at such sites comprises of Information about 
student organizations and campus services; Library 
Catalogues and other databases; Research Opportunities; 
University Newsletters and Journals; Technical Reports and 
Preprints; Administrative or academic department policies; 
Schedules of lectures, plans or movies on campus; Athletic 
event schedules; Directory information; Faculty research 
interests and publications; Course offerings and syllabi. Some 
of the advantages of providing information via CWIS: 


— Making your department or organization more visible. 


— Saving production time and expenses of distribution and 
printing. 


572 Manual of Digital Libraries 
— Fast and easy updating of information. 
— Access library catalogues and other documents. 


— Accessible 24 hours a day, 365 days of year to all 
members of the community via the campus-wide 
network. 


— Cost-effective means to reach a world-wide audience, 
and 


— Easy to use. 
Examples include: 


Harvard University- Campus wide Information System 
(http://www. harvard.edu) : Itis one of the excellent examples 
of CWIS that provide information from a simple virtual tour of 
the campus, leisure activities, to the complex resource bases 
and much more. The home page of the Harvard University 
depicts various parameters for which one could obtain 
complete information. 


Massachusetts Institute of Technology- Campus wide 
Information System (http://web.mit.edu) : The other example 
is of the Massachusetts Institute of Technology. Broadly the 
site provides you pointers to access detailed information 
regarding: Spotlights; News - latest news, research, 
OpenCourseWare; Academics - admissions, schools, 
courses, libraries; Research - labs, centers, and programs; 
Administration - offices and programs, giving to MIT; 
Resources - for alumni, faculty, staff, and students; Campus 
life - groups, activities, jobs available; Events calendar - 
campus events and activities; About MIT - facts, map, virtual 
tour,.evolving campus; and Search - Anything you are 
interested in. 


RESOURCE CATALOGUES 
Resource catalogue is a database of Internet resource 


Internet and Internet Resources 573 


descriptions that is made accessible through a structured and/ 
or unstructured network service. It is sometimes used 
synonymously with ‘portal’ and ‘gateway’, e.g., by the 
Resource Discovery Network. An Internet Resource 
Catalogue may be just one of the services offered by a 
gateway or portal. It lists Internet resources and provide 
hypertext links to these sites. Most of these resources are 
discipline specific, and are maintained by Departments in 
universities/research institutes, libraries in universities, and 
the like. They serve as useful starting points for navigation 
and resource identification. While these may be part of the 
larger catalogue, listing all Internet sites for instance, Yahoo; 
Global Network Navigator; WWW Virtual Library, etc., usually 
categories resources by sub-topics and /or source type. One 
good example is : 

Mathematics Server at the Penn State University at 
http:// www.math.psu.edu. The Math Server facilitates access 
to Course Home Pages; Instructional Material; seminars; 
Colloquia; Conferences; Subject Area Pages; Research 
Centers; Reference Sources, etc. in the area of mathematics. 


PATENTS 

According to U.S. Patent and Trademark Office 
publication, “a patent for an invention is a grant of a property 
right by the Government to the inventor... The right conferred 
by the patent grant is... the right to exclude others from 
making, using, or selling the invention. What is granted is not 
the right to make, use, or sell, but the right to exclude others 
from making, using, or selling the invention.” There are also 
resources that gives general information regarding patents, 
such as how to file a patent, etc.. For example, Genera/ 
Information Concerning Patents (http:// www.uspto.gov/web/ 
offices/pac/doc/general/index.html) is one such source. 

There are other such resources available on the 
Internet. Examples include: 


574 Manual of Digital Libraries 


US Patent and Trade Office - (http://www.uspto.gov) 
: The PTO promotes industrial and technological progress in 
the United States and strengthens the national economy by: 
administering the laws relating to patents and trademarks; 
advising the Secretary of Commerce, the President of the 
United States, and the administration on patent, trademark, 
and copyright protection; advising the Secretary of 
Commerce, the President of the United States, and the 
administration on the trade-related aspects of intellectual 
property. 


It provides free access to bibliographic data of US 
patents issued since 1.1.1976 and all has collections of full 
text patents as well for US and also other countries. 


IBM Patent Server - (http://patent.womplex.ibm.com): 
It provides access to over 30 years of US Patent and 


Trademark Office (USPTO) patent descriptions and last over 
20 years of images. 


DOCUMENT DELIVERY 


The Document Delivery Services or DDS is a fee-based 
service provided to faculty, staff and students, as well as to 
business, industry and individual researchers etc., by various 
universities, and other information providers/ organizations. 
Document Delivery staff generally locates, photocopies, and 
sends directly to the customers office or home a variety of 
documents owned either by the library or has access to 
resources wherein the same are being accessed and send 
through traditional post or electronically. There are number 
of such services available on the Internet, most of which are 
fee based. Examples include: 


lowa State University - Document Delivery Service at 


http:// www.lib.iastate.edu/services/delivery service.html. 
The lowa State University (ISU) Library currently offers the 


various services for delivery of library materials: Document 


Internet and Internet Resources 575 


Delivery Service-Storage Delivery Service; Vet Med Express 
(VME). Since the details about these services are included in 
the web site whose prototype is reproduced here for referral 
purposes, they are not repeated. 


REFERENCE SOURCES 


A vast number of reference resources such as 
dictionaries, directories and other reference resources are 
freely accessible on the Internet. Examples include: 


Refdesk.com (http://www.refdesk.com) : Refdesk is 
only about indexing quality Internet sites and assisting visitors 
in navigating these sites. Refdesk.com has three goals: (1) 
fast access, (2) intuitive and easy navigation and (3) 
comprehensive content. The site has extensive coverage in 
all the areas. 


Encyclopaedia Britannica Online (http:// 
www.britannica.com) : It used to cost upto £3,000 to purchase 
a complete set, but now the Encyclopaedia’s 32 volumes, or 
44 million words, are accessible on the Internet for free! 
What’s more, it offers more than just access to entries in this 
famous encyclopaedia, and for each search topic you input, 
it provides links to some of the Web’s most informative sites, 
plus pointers to books and magazine articles that relate to 


your query. 


Merck Manual (http://www.merck.com/pubs/) : The 
Merck Manual is the one of the most widely used medical 


texts in the world. Written by over 300 experts, it covers all 
but the most obscure disorders. 


COURSEWARE DIRECTORIES 


There is a wide variety of courseware resources 
available via the Internet. These are very valuable resources. 
Examples include: 


576 Manual of Digital Libraries 


Courseware Resources (http://www.iim.uts.adu.au/ 
usingcourseware.shtml) : There is a wide variety of interactive 
multimedia (IM) courseware resources available for using and 
developing multimedia for teaching and learning. These 
include: Databases, catalogues and clearinghouses; 
Development tools/software/Discussion groups; Publishers/ 
IM Courseware vendors; Publications relevant to courseware 
development. 


There are courseware examples and other multimedia 
resources available for the following discipline areas: 


Business; Design, Architecture & Building; Education; 
Engineering; Humanities and Social Sciences; Law; 
Mathematical & Computing Sciences; Nursing & Health 
Sciences; and Science. 


Science Courseware Resources (htip:// 
www.iim.uts.edu.au/using/ courseware/science.html) : The 
site gives courseware resources on - General Science 
Resources; Chemistry and Materials Science; Physics; 
Biology; Microbiology; Cell & Molecular Biology; and 
Environmental/Earth Sciences. 


Edutech’s (Educational Technology Information and 
Resources -UK) Resources (http://www.warwick.ac.uk/ETS/ 
edu.tech/edu-tech.html) : This has a discipline Index to find 
resources under a wide variety of subjects. 


OTHERS 


Besides the above mentioned type of resources, there 
are several other resources as well. New resources are being 
added to the Internet on continuous basis, hence any list is 
liable to get dated. It may however be reiterated that within 
different types, the sources fulfill more than one condition for 
being categorized under different types, and therefore overlap 
of resources under different types is unavoidable. 


Internet and Internet Resources 577 
(B) By Category or Resource Types 


Category or Resource types are a way of categorizing 
the different types of information that can be found on the 
Internet. These are basically service oriented, facilitating 
access to resources. The resources that can be grouped 
under this category are generally subject specific. The main 
categories that can come under this are briefly outlined below 
with suitable example for each case, 


e Subject Catalogues and Directories 
° Rating and Reviewing Services 


e  Subject-based Gateway Services and Virtual Libraries/ 
Resource Guides 


° Virtual Laboratories 


SUBJECT CATALOGUES AND DIRECTORIES 


These are sometimes also referred to as subject listings 
and indexes. These are impressive and can provide better 
search results as in most of such directories, authors 
themselves are responsible for describing their own materials, 
hence the descriptions are generally more meaningful. As 
such it is relatively easy to determine the potential relevance 
of materials from the results. Examples include: 


Yahoo! (http://www.yahoo.com/ ) : It is probably the 
best of the mainstream portals. Yahoo is not a search engine, 
but a directory presented as a hierarchical subject index, 
which can be searched. It is a good starting point for searching 
for more broad or general rather than area-specific 
information. It also has good sections on reference resources, 
libraries, publishers, etc. The snapshot of Yahoo! Homepage 
given in Fig. 5.2 makes it easy to understand how main subject 
categories are listed. 


578 Manual of Digital Libraries 


eb 


YaHoo! 5 z is Web Search 


INDIA 
209 Year on Review 
My Yahoo 


Mae Yi vour hemapage 


Pee 


a 


TODAY- 2) Getember, 2919 


YAHOO! SUES MY LAVOURTES 


Hail Answers Buzz 


Cricket Dainik Jagion Education 


Finance Flicks Games 
} Groups Local Mene 
Da Ti 7 Messenger Mobile Moves 
Clash of the Titans in 2010 
There wa mari year € G 


Fig. 5.2. Snap Shot of Yahoo.com 


CyberStacks (http://www.public.iastate.edu/ 
~CYBERSTACKS) : It is a guide to significant WWW and 


other Internet resources in selected fields of science and 
technology. 


Encyberpedia (http://www.encyberpedia.com/ 
edindex.htm) : It is a directory of subject-orientated links, 


arranged by broad subject groups, but with a more detailed 
index. 


Magellan (http://www.magellan.excite.com) : It is a 


good online guide to the Web, together with reviews by broad 
subject areas; with ‘Search Voyeur’, an intriguing real-time 
site that allows you to ‘spy” on searchers. 


BUBL Information Service (http://bubl.ac.uk) : It is a 
national information service for the higher-education 
community in UK. It links to over 10,000 quality Internet 
resources covering all subject areas, and sub-divided by type. 
Excellent for tracking down UK institutions. 


Internet and Internet Resources 579 
RATING AND REVIEWING SERVICES 


Many of these services developed are of varying value 
and usefulness, facilitating access to Internet materials 
through various forms of site ratings and reviews. As is 
evident, the quality of resource within a subject can be fairly 
judged with the help of these services, that is very important 
for librarian and information science professional’s point of 
view, while rendering the information services using Internet 
resources. 


Examples include: 


eBLAST-Encyclopedia Britannica’s Internet Guide 
(http://www.ebig.com/) : eBLAST covers more than 125,000 
WWW sites and is produced by well know encyclopedia 
publisher, Encyclopedia Britannica. Its editors search the web 
to identify the highest quality web resources, which are then 
clearly and concisely described, rated according to consistent 
standards, and indexed for superior retrieval. Needless to 
say such resources ought to be high quality resources. One 
does not need to verify the authenticity of search results from 
these sources. 


Lycos Top (http://point.lycos.com/) : It offers an 
associated directory of rated and reviewed resources in 


addition to the engine itself and its directory component. 


SUBJECT-BASED GATEWAY SERVICES AND VIRTUAL 
LIBRARIES/RESOURCE GUIDES 


These are generally developed by librarians or subject 
experts, and are an excellent source providing access to 
detailed descriptions of high quality resources. Some of these 
are known as virtual libraries; some are referred to as Subject- 
based Gateway Services while some other are called 
Resource Guides. These are best sources for identifying 
resources in various subject disciplines as they cover full 
range of Internet materials. There are many such excellent 


mmr 


580 Manual of Digital Libraries 


Gateways in existence on the Net in the area of S&T. Some 
of these are indicated below. 


BIOME (hitp://biome.ac.uk) : This a consortium-based 
hub providing access to quality resources on the Internet in 
the fields of agriculture, food, forestry, pharmaceutical 
sciences, medicine, nursing, dentistry, biological research, 
veterinary sciences, the natural world, botany, zoology, and 
like. It consists of five subject gateways, which are cross- 
searchable and cross-browsable. The service is coordinated 
by the University of Nottingham, who are joined by a 
formidable range of high-profile partners and content 
providers from the UK health and life-science sectors. 


EMC Hub (http://www.emc.ac.uk/index.himl) : A 
collaborative service providing easy access to quality Internet 
resources in engineering, mathematics and computing. It is 
run and hosted by Heriot-Watt University in Edinburgh, in 
partnership with Cranfield University and the University of 
Birmingham. 


INASP Links & Resources (http://www.inasp.info/ 
links/index.html) : The links section of the International 
Network for the Availability of Scientific Publications (INASP) 
provides a gateway site to selected Web sites and Internet 
resources that may be of special interest to the library and 
information-science communities, and to scientists and 
publishers in developing countries. In particular it is designed 
to assist organizations involved in electronic networks for 
development, and those who are thinking of moving to an 
electronic environment for scholarly communication. 


PS!Gate (http://www.psigate.ac.uk/home.htm) : It is 


a major gateway or hub, developed by the Consortium of 
Academic Libraries in Manchester (CALIM), bringing together 
high-quality Internet resources in the fields of astronomy, 
physics, chemistry, materials science, the earth sciences and 
environmental science. 


Internet and Internet Resources 581 


JISC (http://www.jisc.ac.uk/) : The JISC Joint 
Information Systems Committee, is establishing posts known 
as Resource Guides in a number of subject areas. These 
posts will take over the maintenance of lists of resources 
known as Subject Guides. In addition, in co-operation with 
associated resource providers, the Resource Guides provide 
information, documentation and training opportunities. Two 
such posts, covering the Arts and Humanities, and the Social 
Sciences, have been long established; others have recently 
started their positioning. 


Subject Guides are currently available in the following 
subject areas— Arts and Humanities, incorporating the subject 
areas of Language, Literature and Culture. Engineering, 
Mathematics and Computer Sciences, incorporating the 
Information Sciences; and Aerospace and Defense 
Engineering; Life Sciences and Health Studies, including 
Biology and Biochemistry; the Natural World; Dentistry and 
Nursing Studies;Physical Sciences, including Chemistry and 
Physics; Social Sciences, including Business, Economics, 
Geography, Law, Politics and Psychology. 


The JISC provides a number of electronic resources, 
services and development projects aimed to meet the needs 
of those working and studying within higher and further 
education. These are brought together here as Subject 
Guides under a number of subject headings. Each Resource 
Guide, when in position, takes over the maintenance of their 
associated Subject guide, and produces print versions and 
ephemera that complement the Web-based guides. There 
are overlaps within the subject areas and it is advisable to 
browse across the subjects. Each section of the Guides 
includes a list of resources with details of access 
arrangements. Some are free and require no license, 
subscription, or registration and can be accessed 
immediately. Other resources require subscription or 


582 Manual of Digital Libraries 


registration. The JISC funded Resource Discovery Network 
(http:// www.rdn.ac.uk/) provides subject-based catalogues 
of descriptions of quality assessed resources. 


VIRTUAL LABORATORIES 


The rationale of the Virtua/ Laboratories is to offer 
around the world, irrespective of their local infrastructure, the 
virtual access to a high technology world class real laboratory. 
Itaims to provide access to a network of renowned scientists 
and laboratories, offering research, technology and 
consultation. The specialty of such site is its interactive nature 
in which it needs to be accessed. Examples include: 


INTERACTIVE- Virtual Laboratory for Nucleic Acid 
and Protein Chemistry (http://www.interactiva.de/) : The 
basic aim of this Virtual Laboratories is to offer around the 
world, irrespective of their local infrastructure, virtual access 
to a high-technology - world-class, real laboratory for fine 
Chemistry, in particular for the synthesis of oligonucleotides 
and oligopeptides. It gives access to genetic information which 
can be searched, manipulated and understood. This site 
provides tools to sequence your own genetic information and 
gain understanding. Besides, the site also provides access 
to a network of renowned scientists and laboratories offering 
research, technology and consultation in bio-organic 
chemistry. The main focus is on nucleic acids, peptide and 
sacharide chemistry. It works as a virtual factory — a specific 
network is formed which is tailored to solving the problems. 


Virtual Chemistry (http://neon.chem.ox.ac.uk/ 
vrchemistry/) : Virtual Chemistry is created and maintained 
by research students at the University of Oxford (UK), which 
uses the latest multimedia technology for virtual experiments. 
Besides including a range of Virtual experiments, the site also 
facilitates access to RealPlayer and has additional links to 
multimedia learning. 


Internet and Internet Resources y 583 


Virtual Laboratories in Probability and Statistics (http:/ 
/ www.math.uah.edu/stst/) : This is created by University of 
Alabama’s Mathematical Sciences Department. This project 
is partially funded by National Science Foundation (NSF) and 
it provides, interactive, web-based modules for students and 
teachers of probability and statistics. 


(C) By Organization 


Under this category, resources have been grouped on 
the basis of type of organization which is responsible for 
collating and providing access to resources via Internet. These 
could be a collection of WWW pages that are created and 
maintained by a particular organization. The opening page is 
generally referred to as the home page. Organizational sites 
always include subject-based pages and personal home 
pages. 

These resources could be from: 

e Booksellers, Publishers and the Media 
e Commercial Online Information Retrieval Services 


° Government, Government-Related and International 
Organizations 


° Professional Associations 
° Networking Organizations 


e University Department 


BOOKSELLERS, PUBLISHERS AND THE MEDIA 


All these provide formation regarding resources about 
publishing, bookselling and related aspects available on the 
Internet. Examples Include: 


IBIC-Internet Book Information Centre (http:// 
sunsite.unc.edu.ibic/) : It is an information server from 
SunSITE at University of North Carolina (USA). This site 


584 Manual of Digital Libraries 


provides good links to wealth of publishing and bookselling 
resources on the Internet, including awards and bestsellers, 
author information, book reviews, USENET newsgroups as 
well as directories of publishers and booksellers themselves. 


The World-Wide Web Virtual Library - Publishers 
(http:// www.comlab.ox.ac.uk/archive/publishers.html) : It is 
an extensive alphabetical list of Internet-accessible publishers 
from around the world. This site also provides links to selected 
online bookstores and to broadcasters. 


COMMERCIAL ONLINE INFORMATION RETRIEVAL 
SERVICES 


Such types include resources from commercial 
organizations available on the Net for information retrieval 
purposes. Access to such resources is generally fee-based. 
Examples include: 


STN International (http://www.fiz-karlsruhe.de/ 
stn.html) : It is an online retrieval service provided by STN 


(Scientific and Technical Information Network), Germany 
covering over 100 scientific and technical databases with a 
strong emphasis on chemical information. 


DataStar Web (http://www.krinfo.ch/krinfo/products/ 
dsweb/) : Itis another online information service provided by 
Knight-Ridder Information (USA). This is a multidisciplinary 
online host, with an emphasis on healthcare, biomedical, 
biotechnology and business information. The server provides 
brief keyword-.searchable descriptions of all DataStar 
databases, listed by their codes. 


GOVERNMENT, GOVERNMENT-RELATED AND 
INTERNATIONAL ORGANIZATIONS 


These are the resources put up by government, 
government related and other international organizations. 
Generally such sources could be taken as authentic, since 


Internet and Internet Resources 585 


responsible agencies are answerable for information content. 
Such sites are good source of information regarding 
government and other organizations Examples include: 


FedWorld Information Network (http:// 
www.fedworld.gov/) : It is an information Server from NTIS 
(National Technical Information Service), USA, which includes 
US Government information servers, FTP, Gopher, and telnet 
services organized by subject. It also reports databases and 
software from NTIS. 


PROFESSIONAL ASSOCIATIONS 


Such information servers provide information regarding 
professional associations like American Library Association, 
etc. Examples include: 


IEE Home Page (hittp://www.iee.org.uk/) : This is 
provided by Institution of Electrical Engineers (UK), covering 
professional, publishing and information services from the 
IEE. 


BCS Net (http://www.bes.org.uk/) : British Computer 


Society (UK) is the provider of BCS Net and it includes BCS 
structure, events, awards, publications, etc. 


NETWORKING ORGANIZATIONS 


These resources are generally related to network 
issues, and are good sources for up-to-date information on 
network developments. Examples include: 


IETF-Internet Engineering Task Force (http:// 
www.ietf.cnri.reston.va.us/) : It is working under the auspices 
of the Internet Society. IETF provides management, 
standards and technical information and many sources of 
further information. 


TERENA- Trans-European Research and Education 
Networking Association (http://www.terena.nl/) : TERENA 


586 Manual of Digital Libraries 


Secretariat — the Netherlands and Europe, gives information 
about TERENA and its predecessor EARN. Also gives 
European-wide networking initiatives including reports, calls 
for proposals, conference papers and abstracts. It also 
contains documentation for major internet tools and links to 
international networking centres, e,g. InterNic. 


UNIVERSITY DEPARTMENTS 


This type of resources has been already discussed 
under CWIS in this chapter. 


HOW-TO KEEP UP-TO-DATE WITH NEW INTERNET 
RESOURCES 


The Internet is getting bulkier day by day and new 
resources are being added to the Internet every now and then. 
Therefore looking for a specific piece of information on the 
Internet is like searching for a needle in a hay stack. How 
does one know which is the best site from where one can get 
information that one is looking for? Search tools, resource 
directories and current awareness services offer some 
solutions. Keeping up-to-date is essentially a matter of 
following appropriate discussion lists and usenet newsgroups. 
Other people recommend or announce new resources which 
they find on the Net or have used. Similarly you can bookmark 
or preserve URL of a source in a file for future use which you 
think is appropriate. However there are some sources 
available on the Internet which keeps you posted about new 
resources. Examples include:- 


Internet Resources Newsletter (http://www.hw.ac.uk/ 
libWWW/irn/ aboutirn.html) : Internet Resources Newsletter 
is a free monthly electronic newsletter, edited by Heriot-Watt 
University Library staff and published by Heriot-Watt 
University. This newsletter aims to raise awareness of new 
sources of information on the Internet, particularly those which 


Internet and Internet Resources 587 


are relevant to research interests at Heriot-Watt University, 
including engineering, science, and social sciences. The 
newsletter is available via the Internet and as an email 
newsletter to subscribers. 


Scout Report (http://re/internic.net/scout/report) : It is 
a weekly publication describing new resources of interest to 
researchers and educators. 


SENN-Scientfic and Engineering Network News (http:/ 
/www.senn.com) : It includes top searches in various areas 
of science and engineering. 


5.10. INTRANETS 


In addition to Internet, there is an another term which is 
known as Intranet. The general definition of an ‘Intranet’ is 
an organization’s internal communication system using 
Internet technology. Intranet use two of the key applications 
of Internet— the Web browsers with their graphical user 
interfaces, and e-mail. They are being used to support a wide 
range of information services, including access to document 
collection in document management system, and e-mail. 


5.10.1 Use of Intranets 


Intranet technology is attractive to organizations 
because:- 


— The interface is easy to use. It also encompasses . 
access to multimedia formats such as text, video, sound 
and graphical images. 


—  Asingle interface to all formats of information using the 
Internet open standard, removes the requirements for 
anorganization network to provide several dedicated 
interfaces traditionally needed to interrogate proprietary 
system such as databases bibliographic information 
retrieval system and management information system. 


\ 


588 Manual of Digital Libraries 
Also the user only needs to be familiar with no interface. 


— Compared to the cost of employing proprietary 
information system, a group ware Intranets are very 
inexpensive to set up. In addition, proprietary packages 
also use in-house protocols, which often result in a 
dependency on the software distributor, and update and 
utilities may only be acquired from the original vendor. 


— This provide improved access in a number of respects, 
such as:- 


e Documents may be shared across all major 
networking platforms. 


e Information is accessible regardless of the users 
location. 


e Awork station configured for use on an Intranet is 
also ready for Internet use if necessary gateways 
are incorporated into the network. 


e Access and use of groups using the Intranet may 
be monitored, making it possible to access the 
value of services and resources offered on the 
Internet. 


e User authentication system can be incorporated 
into browsers, so that information can be 
controlled. 


— They allow for maintenance of correct documents by 
offering access to electronic documents that will always 
be the latest version. This eliminates significant 
reprography and time spent trying to locate lost paper- 
based documents. 


5.10.2 Applications of Intranets 


The applications for which Intranets can be used 
depend upon the type of Intranet. Generally, there are two 


Internet and Internet Resources 589 


main types of Intranet— Flat content / file Intranet and 
Interactive Intranet. 


In flat content or file Intranet, files are simply requested 
from a storage location or server, received by the desktop 
computer and viewed through the Web browser. Some 
applications of such Intranets are: Telephone Directories; 
News Letters; Calendars; Policy manuals; Quality systems; 
Document distribution and updating; Current awareness 
bulletins; Electronic journal delivery; Internet subject resource 
listing; Library opening times and service and other contact 
information; and Information skill support material. 


Interactive Intranet offers many opportunities for two- 
way communication with an organization. However, the 
configuration is slightly more complex. When a user wishes 
to send, change, respond or forward any kind of information 
to another location or person, a specific programme or script 
is needed to process this information. These scripts need to 
be held on internal web server, which uses TCP/IP. If an 
organization network does not run under TCP/IP, then 
gateway or firewall software may be used as an interface 
between the organization network protocol and the TCP/ IP 
protocol. Application with such configuration include: E-mail 
including messages and attached documents; Computer 
based training and learning — packages can be authored and 
delivered in house; Videoconferencing; Interactive services 
such as reservation system, order/ purchase of documents, 
reports and surveys facilitated by firms, processing and ‘mail 
to’ HTML (Hypertext; Markup Language) commands; Web 
boards on-line conferencing areas; Support disks; On-line 
inquiry services; Loan renewals; and Access to the library 
catalogue from world wide web. 


Thus, intranet can also be used for library and 
information services to some extents. 


590 Manual of Digital Libraries 


Lasting it can be said ‘Changes’ are inevitable and to 
make-up one needs to know acquire and update oneself with 
the change. Although the change is so rapid that one become 
out of date very soon but tuning up with the latest technology 
is the only answer. But no one knows what the future of 
Internet is but some things are known. The rate of growth of 
Internet has been astounding with number of users doubling 
every six months. There are also rapid changes in the types 
of global communication and most see a convergence of voice 
communication, video transmission and data communication 
fastly. So Welcome the Internet. You are now a citizen of 
global village community. 


6 
Cataloguing Digital 
Resources : Metadata 
and Its Creation 


The catalogue is itself a collection—a collection of 
surrogates for items in the primary collection and these 
surrogates must be arranged as well. There is a highly 
articulated set of strategies for organizing catalogues, e.g. 
alphabetically by author and/or by subject. Current 
cataloguing practices involve both strategies. Surrogates are 
created for the items and arranged in a catalogue. The items 
themselves are also arranged—e.g. books are placed “linearly” 
on the shelves of a library’s stacks. This is typically 
accomplished via a classification scheme, such as the Library 
of Congress Classification (LCC), Colon Classification (CC) 
or the Dewey Decimal Classification (DDC), in which a 
hierarchy of possible subjects is given a linear ordering and 
each subject is given its place on the shelves. 


There are two types of cataloguing activity, both of which 
are practiced to catalogue a particular item — descriptive 
cataloguing and subject cataloguing. Descriptive cataloguing 
is concerned with creating catalogue records for items, 
describing their characteristics as just noted—author, title, and 
so on. Subject cataloguing is concerned with classifying the 
subject matter, the intellectual content, of an item. It is the 


592 Manual of Digital Libraries 


subject cataloguer who assigns an item to a class within a 
classification scheme which in turn determines a place on 
the shelf. A distinction is also made between bibliographies 
and catalogues. Both of these in practice describe items. The 
difference is that a bibliography describes works and editions 
of works, but not actual physical items. 


Acatalogue, by contrast, primarily describes particular, 
physical items in a particular collection. It does this partly by 
describing aspects of a work (e.g. title and author), as does a 
bibliography, and partly by indicating physical properties, 
including its location. The development of highly sophisticated, 
systematically organized catalogues and cataloguing 
procedures is a product of the modern library era, which dates 
from the second half of the last century. Book catalogues 
were the first kind used; these began to be displaced by the 
familiar card catalogues around the turn of the century. 


Digital catalogues (called OPACs, Online Public Access 
Catalogues) began to appear in the 1970’s and are now 
widespread; they are rapidly displacing card catalogues. 
Entries in OPACs are commonly encoded in MARC (Machine 
Readable Cataloguing) format — a standard which permits 
them to be shared among institutions. To a great extent, the 
MARC standard encodes those features previously recorded 
on cards; and it enables a fairly straightforward translation of 
card contents into digital form. 


Digital resources can be created with no information 
about their provenance, change location or disappear without 
warning, have content which changes rapidly and lack any 
kind of quality control. Some more familiar media found in 
digital libraries also pose problems, such as still and moving 
images, which are especially difficult to describe for effective 
retrieval. Unfortunately, providing access to information 
remains a sophisticated task not amenable to automation. 
There are two levels — micro and macro levels, at which 


Cataloguing Digital Resources : Metadata and Its Creation 593 


cataloguing of digital libraries might be considered. The 
“micro” scale of describing individual items has seen 
extensions to existing cataloguing rules to cope with the 
vagaries of digital information outlined above. 


Of fundamental importance in the digital library is the 
information represented by a link between two resources, 
such as scholarly papers, and there are several proposals 
for maintaining these relationships against passing changes 
in URLs. At the “macro” scale of describing collections or 
archives and ways to access them, there have been new 
approaches to integrating catalogues and much reflection on 
the purpose of those catalogues. Resources likely to be 
combined in a digital library are sufficiently various in type 
and distributed in location that it is sensible only to attempt 
interoperability rather than centralization. 


For centuries, they have successfully managed, 
classified, and filtered information of many types by creating 
surrogates. This is true of traditional materials as well as new 
electronic resources. But now as the amount of accessible 
electronic information increases, the cost of accessing this 
information will increase and the communities unfamiliar with 
library science are beginning to grapple with the problem of 
metadata and the organization of large collections of data. 
So there has been a general push to apply and develop 
techniques to make these resources searchable and more 
widely accessible. Now, the requirement is that every 
electronic item should have a catalogue entry or its equivalent. 


But all electronic resources can never be humanly 
catalogued. It is just too expensive and also librarians are so 
overburdened that they can barely keep up with their 
traditional workload, let alone begin to catalogue and organize 
the vast amounts of information available electronically. 
Clearly, automated tools to apply library science ideas like 
classification and filtering to electronic resources at high speed 


594 Manual of Digital Libraries 


and low cost are needed. The whole scenario of information 
management can. be divided into two worlds. These two 
worlds-the seemingly unorganized Web and the organized 
world of libraries-have much to offer one another. 


The Web can offer automated tools for searching raw 
information and the library world can offer experience 
organizing and understanding information of all types. By 
combining their talents and techniques, these two 
communities can bring powerful resources to bear on the 
problems of accessing, maintaining, and supplying electronic 
information. But this is not an easy task. Even though using 
latest technologies, searching the raw content of every 
document still seems to be almost impossible since it is not 
uncommon to retrieve hundreds of documents for a given 
search. 


6.1. CHALLENGES TO TRADITIONAL CATALOGUING 
PRACTICES 


A user of the Web who has sampled any of the 
numerous search engines, e.g., Google, AltaVista, Excite, 
might argue that digital content, networks, and full-text 
indexing have made human-mediated organization through 
cataloguing obsolete. Web search engines demonstrate the 
great utility of such searching and the benefits of over 30 
years of research in information retrieval and, to a lesser 
degree, natural language processing. However, there are 
numerous inherent limitations to the technology underlying 
them: 


Scalability : Most Web search engines accumulate 
indexes by scanning the Web and downloading full content 
from sites. As the volume of Web content grows, it has 
become increasingly difficult to keep these indexes current 
or complete. One study indicates that even the best search 
engines index only about 12 to 15 percent of Web content. 


Cataloguing Digital Resources : Metadata and Its Creation 595 


Even more problematic are the limitations of the information 
retrieval technology used in most popular Web search 
engines. The nature of the Web as a corpus presents some 
difficult scalability challenges for information retrieval and 
often leads to poor results. The sheer size of the corpus is a 
notable problem — a typical Web query will retrieve a very 
large set of potentially relevant documents. 


In addition, the Web corpus is usually presented as a 
single, unorganized collection of documents, which makes 
synonym clashes inevitable. Synonym clashes are well 
understood and can be addressed with a variety of techniques 
(e.g., thesauri, user feedback, local context analysis, phrase 
structure), but these techniques are generally not exploited 
by Web search engines. 


Access Limitations and Databases : While a great deal 
of useful content is freely available on the Internet, there is a 
growing and equally valuable portion of Internet content that 
is proprietary and held in protected systems. Much “valuable” 
content — from the point of view of the rights holder, is held in 
databases on special servers that require a password or other 
means of authorisation for access. These databases also 
provide enhanced functions beyond what can be done with 
simple, static Web pages. However, they do not in general 
support access via crawling, the method used to build most 
Internet search services. Thus, while Web indexers are able 
to access and index a large percentage of the total content 
on the Web, an ever-growing percentage of the most-sought- 
after material is not available from Internet search facilities. 


Format : Existing Web search engines are limited to 
textual content. They index words in documents and process 
textual queries—for example, “digital imaging’—returning lists 
of documents ranked according to the appearance of the 
query words in their content. Extending this approach to 
images will require tools to analyse image content and 


596 Manual of Digital Libraries 


respond to queries such as “find images with cars in them” 
or, at an even more advanced level, to queries that ask for 
images with features similar to those of another digitized 
image. The tools to retrieve images, video segments, voice, 
and music are being actively researched but are currently 
beyond the capabilities of Internet search engines. 


Context : At a more abstract level, the usefulness of 
indexing based solely on the text content of a resource is 
compromised by lack of context. The best tools to help a 
person to locate a resource are those that are tailored to the 
context in which the resource occurs and to the knowledge 
context of the searcher. For example, content-based searches 
of MEDLINE, the medical index at the US National Library of 
Medicine, might be appropriate for a professional familiar with 
medical terminology and with the body of medical literature 
indexed by MEDLINE. However, a high-school student might 
not be able to select documents that are appropriate to his or 
her background and might not be familiar with medical terms, 
so he or she might find content-based searching to be difficult. 
The lack of context is a problem for both human and 
automatically generated representations, but different 
representations can often be combined to good effect. 


Markup : HTML, the main markup language of the Web, 
provides only a very simplified set of tags for labelling the 
parts of a document. These are primarily oriented to 
supporting the appropriate display of the document and in 
general tell little about the meaning of the various sections of 
the document. Many search engines utilize smart markup to 
provide more powerful retrieval facilities, allowing users to 
limit which parts of a document are used to satisfy the search 
argument. The simplicity of HTML markup severely constrains 
Internet search engines’ use of such facilities, which are very 
useful for limiting and refining search results. 


The creation of structured descriptive records for 


Cataloguing Digital Resources : Metadata and Its Creation 597 


resources, e.g., traditional cataloguing records or, more 
generally, surrogates can help to address some of these 
limitations. Scalability can improve if surrogates are used 
instead of the full content for indexing. Content providers may 
be more willing to distribute freely descriptive surrogates for 
indexing, in lieu of the full content. Surrogates can be created, 
and standards are being developed for describing all manner 
of digital objects. Finally, surrogate records may include 
descriptive information that is not part of the document itself— 
usually the result of human analysis. 


For example, surrogates to facilitate searching 
MEDLINE by high-school students might associate more 
common medical subject terms with the resources, thus 
making searching easier for this community. On the other 
hand, there is broad agreement among the committee 
members and in the general information community that the 
nature of resource description needs a thorough examination 
in the context of digital resources. 


So the traditional cataloguing which is one kind of 
resource description, in turn, is one kind of “metadata”, the 
information that describes the structure or content of a 
document but is not part of the document. But, the biggest 
barrier to effective use of the information available on the 
Internet is finding the right resource at the right time, given 
the fact that the Internet is not an organized body of 
information. It is, rather, an assembly of unorganized 
information, and hence it cannot provide the kind of 
sophisticated indexing and search capabilities that the 
commercial databases offer. Although there are thousands 
of search engines, meta search engines, directories and 
specialty search engines that a user can use for conducting 
a search on a given topic, the search results conducted using 
any Internet search engine too many links to unfiltered 
information resources. A user is confronted with the daunting 
task of locating relevant information from a great volume of 


598 Manual of Digital Libraries 


contents consisting of links to information resources of varying 
quality. The growth of the web has proliferated so rapidly and 
in SO many areas of society that bibliographic control on the 
resources available on the Internet has become an urgent 
necessity if itis to mature into a reliable and effective medium 
of communication. 


Because of this rapid growth, librarians and information 
professionals are developing a variety of solutions for bringing 
the explosion of web resources under control. The 
unorganized, unstructured and uncontrolled information 
contents of the Internet call for an attempt on the part of the 
librarians and information professionals to provide structured 
and systematic access to information available on Internet 
by: 


— Selecting, evaluating and filtering information content 
of the Internet; 


— Classifying and cataloguing these resources 
considering their inherent characteristics; and 


— Providing an organized and structured guide to the 
resources selected in the first step through a meta- 
resource site or through the library's home page. 


The role of a library is to provide organized and 
structured access to its collection for their users. The access 
infrastructure in a library usually consists of a library catalogue 
or the Library OPAC, The library selects its printed resources 
after careful evaluation of the resources to be acquired before 
they are added to the collection. The resources, added in the 
process of collection development, are classified and 
catalogued so as to ensure that they reach their respective 
users. The information resources available on the Internet 

cannot be treated differently from those available in the print 
media within a library. The Internet and the web are merely 
new media that provide access to different-types of 


Cataloguing Digital Resources : Metadata and Its Creation 599 


information resources. It is, therefore, imperative for the 
libraries to apply their expertize in the process of selection, 
evaluation and filtration of information resources and to 
provide organized and structured access to them. A meta 
resource site consisting of information resources that are 
carefully selected by the librarians and information specialists, 
serve the users best through their value-added characteristics 
that provide intuitive access to a selective few, high-quality 
information resources. 


6.2. WHAT IS METADATA? 


Metadata is structured information that describes, 
explains, locates, or otherwise makes it easier to retrieve, 
use or manage an information resource. Metadata is often 
called “data about data” or “information about information”. 
The term metadata is used differently in different 
communities— some use it to refer to machine understandable 
information, while others use it only for records that describe 
electronic resources. However, in the library environment, 
metadata is commonly used for any formal scheme of 
resource description, applying to any type of object, digital or 
non-digital. Traditional library cataloging is a form of metadata, 
and MARC 21 and the rulesets used with it such as AACR-2 
are metadata standards. 


A formal definition is : “metadata are data associated 
with objects which relieves their potential users of having to 
have full advance knowledge of their existence or 
characteristics”. But, generally a meta data can be defined 
as an organized and structured guide to Internet-based 
electronic information resources that are carefully selected 
after a predefined process of evaluation and filtration in a 
subject area or specialty. Meta data are often independent 
websites or part of an institution or library's website that serve 
as a guide to Internet resources considered appropriate for 
their target audiences. A meta resource site that is a part of 


600 Manual of Digital Libraries 


an institutional website or the library’s website may include 
resources that are on subscription by the parent organization 
or are accessible for free, to all. A meta resource site may 
also be built by a commercial enterprise that is accessible 
free of cost upto the bibliographic level. However, a user may 
be required to pay if he / she wishes to access the full-text. 
The home pages of all the major educational and research 
institutions, especially in the developed world, provide an 
organized and structured guide to electronic resources 
available on the Internet. 


6.2.1. Basic Characteristics of Meta Data 


A meta data serves as a discovery tool, enabling users 
to quickly locate the most relevant Internet resources. It directs 
its users towards contents that are freely available although 
difficult to find using a non-specific search engine. A meta 
resource site is inherently reliable because the resources 
included in it are selected on the basis of predefined selection 
criteria, catalogued by following consistent practices and are 
analyzed by people with expertise in the relevant subject 
discipline. Links on a meta data are checked frequently in an 
automated process and entries are updated regularly by 
subject specialists. A meta data offers a community of users 
a single entry point to resources for a given topic or sets of 
topics. Some basic characteristics of a meta-data are as 

follows: 


— Good metadata should be appropriate to the materials 
in the collection, users of the collection, and intended, 
current, and likely use of the digital object. Good 
metadata supports interoperability. 


— Good metadata uses standard controlled vocabularies 
to reflect the what, where, when, and who of the content. 


— Good metadata includes a clear statement on the 
conditions and terms of use for the digital object. 


Cataloguing Digital Resources : Metadata and Its Creation 601 


— Good metadata records are the objects themselves and 
therefore should have the qualities of good objects, 
including archivability, persistence, unique 
identification, and so forth. Good metadata should be 
authoritative and verifiable. 


— Good metadata supports long-term management of 
objects in collections. 


6.3. TYPES OF METADATA 


Metadata support efficient and effective organization, 
access and retrieval of information contents in a digital library. 
Besides providing access to intellectual contents of a 
document, a function analogous to bibliographic records, 
digital objects also require metadata about applications and 
formats used for creating a digital object. Such metadata is 
required to provide long-term access to a digital resource. 
The following four types of metadata are associated with the 
digital objects: 


6.3.1. Descriptive Metadata 


Descriptive metadata are used to describe textual / non- 
textual contents of a digital object. They include content or 
bibliographic description consisting of keywords and subject 
descriptors that may be assigned using controlled vocabulary 
or thesaurus like MESH, INSPEC, Library of Congress 
Subject Headings (LCSH). 


6.3.2. Administrative or Technical Metadata 


These consist of information necessary to allow a 
repository to manage digital objects contained in it. 
Administrative metadata incorporate details on original 
source, date of creation / scanning, version of digital object, 
file format used, compression technology used, object 
relationship, etc. Administrative metadata also include 


602 Manual of Digital Libraries 


copyright and licensing information and information that is 
necessary for the long-term preservation of the digital objects. 
Administrative data may reside within or outside the digital 
object and is required for long-term collection management 
to ensure longevity of digital collection. 


6.3.3. Structural Metadata 


These are the elements within digital objects that 
facilitate navigation, e.g. table of contents, index at issue level 
or volume level, page turning in an electronic book, etc. 


6.3.4. Identification Metadata 


These metadata are used for tracking different versions 
and editions of same digital work, i.e. pdf, HTML, PostScript, 
MS Word, etc. and TIFF, JPG, BMP, etc. in case of images. 


But, only descriptive metadata are visible to the users 
since they facilitate searching and browsing operations, and 
indicate the value of items in the collection. Administrative or 
technical metadata are used for long-term maintenance of 
digital collection. 


In general, metadata can describe resources at any 
level of aggregation. They can describe a collection, a unitary 
resource, or a component part of a larger resource — for 
example, a photograph in an article. Just as cataloguers make 
decisions about whether a catalogue record should be created 
for a whole set of volumes or for each particular volume in 
the set, so the metadata creator makes similar decisions. 
Metadata can also be used for description at any level of the 
information model laid out in the IFLA (International 
Federation of Library Associations and Institutions) Functional 
Requirements for Bibliographic Records (http://www. ifla.org/ 
VII/s13/frbr/frbr.pdf): work, expression, manifestation, or item. 
For example, a metadata record could describe a report, a 

particular edition of the report, or a specific copy of that edition 


Cataloguing Digital Resources : Metadata and Its Creation 603 
of the report. 


Metadata can be embedded in a digital object or they 
can be stored separately. Metadata are often embedded in 
HTML documents and in the headers of image files. Storing 
metadata with the object it describes ensures the metadata, 
will not be lost, obviates problems of linking between data 
and metadata, and helps to ensure that the metadata and 
object will be updated together. However, it is impossible to 
embed metadata in some types of objects — for example, 
artifacts. Also, storing metadata separately can simplify the 
management of the metadata itself and facilitate search and 
retrieval. Therefore metadata are commonly stored in 
database systems and linked to the objects described. 


6.4. WHAT DOES METADATA DO? 


The important reason for creating descriptive metadata 
is to facilitate discovery of relevant information. In addition to 
resource discovery, metadata can help to organize electronic 
resources, facilitate interoperability and legacy resource 
integration, support digital identification, and support archiving 
and preservation. 


6.4.1. Resource Discovery 


Identification of information is known as resource 
identification. Today in the semantic web environment, 
information is treated as objects. In this context resource 
identification comes to be known as Resource Discovery. 


Metadata serves the same functions in resource 
discovery as good cataloging does by: 


— allowing resources to be found by relevant criteria; 
— identifying resources; 

— bringing similar resources together; 

— distinguishing dissimilar resources; 


604 Manual of Digital Libraries 


— giving location information. 


6.4.2. Organizing Electronic Resources 


As the number of Web-based resources grows 
exponentially, aggregate items or portals are increasingly 
useful in organizing links to resources based on audience or 
topic. Such lists can be built as static web pages, with the 
names and locations of the resources “hard coded” in the 
HTML. However, it is more efficient and increasingly more 
common to build these pages dynamically from metadata 
stored in databases. Software tools such as ColdFusion® 
can be used to automatically extract and reformat the 
information for web applications (http://www.allaire.com/ 
Products /coldfusion/). 


Another method of organizing Web information is 
through channels. Channels are preselected Web sites that 
automatically “push” streams of information to a user’s 
browser, commonly used for continuously updated 
information such as stock quotes and news. The dominant 
metadata scheme for webcasting is the Channel Definition 
Format (CDF) developed by Microsoft and its partners. 


6.4.3. Interoperability 


Describing a resource with metadata allows it to be 
understood by both humans and machines in ways that 
promote interoperability. Interoperability is the ability of 
multiple systems, with different hardware and software 
platforms, data structures, and interfaces, to exchange data 
with minimal loss of content and functionality. Using defined 
metadata schemes, shared transfer protocols, and 
crosswalks between schemes, resources across the network 
can be searched more seamlessly. 


Two approaches to interoperability are cross - system 
search and metadata harvesting. The Z39.50 protocol is 


Cataloguing Digital Resources : Metadata and Its Creation 605 


commonly used for cross system search. Z39.50 partners 
do not share metadata but map their own search capabilities 
to a common set of search attributes. A contrasting approach 
taken by the Open Archives initiative (http:// 
Wwww.openarchives.org) is for all partners to translate their 
native metadata to a common core set of elements and 
expose this for harvesting. A search service then gathers the 
metadata into a consistent central index to allow cross- 
repository searching regardless of the metadata formats used 
by participating repositories. 


6.4.4. Digital Identification 


Most metadata schemes include elements such as 
standard numbers to uniquely identify the work or object to 
which the metadata refers. The location of a digital object 
may also be given using a file name, URL, or some more 
persistent identifier such as a Persistent URL (PURL) or the 
Digital Object Identifier (DOI). 


Persistent identifiers are preferred because file 
locations change frequently, making the URL invalid. In 
addition to the actual elements that point to the object, the 
metadata can be combined to act as a set of identifying data, 
differentiating one object from another for validation purposes. 


6.4.5. Archiving and Preservation 


Most current metadata efforts center discovery of 
recently created resources. However, there is a growing 
concern that digital resources will not survive in usable form 
into the future. Digital information is fragile and it can be 
corrupted or altered, intentionally or unintentionally. It may 
become unusable as storage media and hardware and 
software technologies change. 


Format migration and perhaps emulation of current 
hardware and software behaviour in future hardware and 


606 Manual of Digital Libraries 


software platforms are strategies for overcoming these 
challenges. Metadata are the key to ensuring that resources 
will survive and continue to be accessible into the future. 
Archiving and preservation require special elements to track 
the lineage of a digital object, to detail its physical 
characteristics, and to document its behaviour in order to 
emulate it on future technologies. 


Many organizations internationally are working on 
defining metadata schemes for digital preservation, including 
the National Library of Australia (http://www.nla.gov.au/padi/ 
topics/32.html), the British Cedars Project — CURL Exemplars 
in Digital Archives (http://www.leeds.ac.uk/ cedars/metadata. 
html), and a joint Working Group of OCLC and the Research 
Libraries Group (RLG) (http://www.oclic.org/ 
digitalpreservation/ presmeta_wp.pdf). Many of these 


initiatives are based on or compatible with the ISO Reference 
Model for an Open Archival Information System (OAIS) which 
incorporates preservation metadata along with descriptive, 
administrative, and rights management metadata (hitp:// 


www.ccsds.org /RP9905/RP9905.html). 


6.5. METADATA STANDARDS 


Metadata schemes or the standards are sets of 
metadata elements designed for a particular purpose, for 
example, to describe a particular type of information resource. 
The definition or meaning of the elements themselves is 
known as the semantics of the scheme. The values given to 
metadata elements are the content. Metadata schemes 
generally specify names of elements and their semantics. 
Optionally, they may specify content rules for how content 
must be formulated (for example, how to identify the main 
title) and/or representation rules for how content must be 
represented (for example, capitalization rules). There may 

also be syntax rules for how the elements and their content 


should be encoded. 


Cataloguing Digital Resources : Metadata and Its Creation 607 


A metadata scheme with no prescribed syntax rules is 
called syntax independent. Metadata can be encoded in 
MARC, in “keyword=value” pairs, or in any other definable 
syntax. Many current metadata schemes use SGML or XML. 
XML (Extensible Mark-up Language) is an extended form of 
HTML which allows for locally defined tag sets and the easy 
exchange of structured information. SGML (Standard 
Generalized Mark-up Language) is a superset of both HTML 
and XML and allows for the richest mark-up of a document. 


There are a number of metadata standards. Some of 
these schemes are applicable to documents received in a 
library, others have broader scope. These metadata 
standards attempt to describe the author, the work, and the 
context in which the work was produced in a way that will be 
useful to the researcher as well as the librarians and / or 
technical staff maintaining the work in its electronic form. 


A few of the most common ones are mentioned below: 

° Marc 21; 

e Dublin Core; 

e Global Information Locator Service (GILS); 

e Text Encoding Initiative (TEI) Header; 

e Encoded Archival Description; 

ə Visual Resources Association (VRA) Core Categories; 

e Onix International; 

e = Common Communication Format (CCF); 

e MARCXML; 

e Metadata Encoding and Transmission Standard 
(METS); 

e Metadata Object Description Scheme (MODS). 


608 Manual of Digital Libraries 
6.5.1. MARC 21 Concise Bibliographic Data Format 


MARC 21 Format for Bibliographic Data is designed to 
be a carrier for bibliographic information about printed and 
manuscript textual materials, computer files, maps, music, 
continuing resources, visual materials, and mixed materials. 
Bibliographic data commonly includes titles, names, subjects, 
notes, publication data, and information about the physical 
description of an item. The bibliographic format contains data 
elements for the following types of material: 


Books : Textual material that is monographic in nature. 


Continuing Resources : Textual items with a recurring 
pattern of publication, e.g., periodicals, newspapers, 
yearbooks. 


Computer Files : This is used for computer software, 
numeric data, computer-oriented multimedia, online systems 
or services. Other classes of electronic resources are coded 
for their most significant aspect. Material may be monographic 
or serial in nature. 


Maps : All types of cartographic materials, including 
sheet maps and globes in printed, manuscript, electronic, and 
microforms. 


Music : It includes printed and manuscript notated 
music. 


Sound Recordings : This includes nonmusical sound 
recordings, and musical sound recordings. 


Visual Materials : This is used for projected media, two- 
dimensional graphics, three-dimensional artifacts or naturally 
occurring objects, and kits. Also, used for archival visual 
materials when format or medium is being emphasized. 


Mixed Materials : It is for primarily archival and 
manuscript collections of a mixture of forms of material. 
Material may be monographic or serial in nature. 


Cataloguing Digital Resources : Metadata and Its Creation 609 
A MARC Record sample is shown in Panel 6.1. 
Panel 6.1. AMARC Record 


Consider a monograph for which the bibliographic 
citation might be written as follows: 


Caroline R, Arms, editor, Campus strategies for 
libraries and electronic information. Bedford, Mass.: 
Digital Press, 1990. 


A search of the Library of Congress catalogue, done 
with a terminal-based interface, displays the entry for 
the aforementioned work in a form that shows the 
information in the underlying MARC record: 


&001 89-16879 r93 
&050 Z675.U5C16 1990 
&082 027.7/0973 20 


&245 Campus strategies for libraries and electronic 
Information/Caroline Arms, editor. 


&260 {Bedford, Mass.}: Digital Press, cl990. 
&300 xi, 404 p.: ill.; 24 cm. 


&440 EDUCOM strategies series on information 
technology 


&504 Includes bibliographical references (p. {373}- 
381). 

&020 ISBN 1-55558-036-X: $34.95 

&650 Academic libraries—United States—Automation. 


&650 Libraries and electronic publishing—United 
States. 


&650 Library information networks—United States. 
&650 Information technology—United States. 
&700 Arms, Caroline R. (Caroline Ruth) 

&040 DLC DLC DLC 

&043 n-us— 


&955 CIP ver. br02 to SL 02-26-90 
&985 APIF/MIG 


610 


Manual of Digital Libraries 


The information is divided into fields, each with a three- 
digit code. For example, the 440 field is the title of a 
monograph series, and the 650 fieids are Library of 
Congress subject headings. Complex rules tell the 
cataloguer which fields should be used and how 
relationships of elements should be interpreted. 


The actual coding is more complex than what is shown 
here. The full MARC format consists of a pre-defined 
set of fields, each identified by a tag. Subfields are 
permitted. Fields are identified by three-digit numeric 
tags, sub-fields by single letters. To get a glimpse of 
how information is encoded in this format, consider 
the 260 field, which begins with &260. In an actual 
MARC record, this is encoded as follows, where the 
string “abc” indicates that there are three subfields. 


&2600#abc#{Bedford, Mass.} :#Digital Press, 
#c1990.% 


The first subfield, indicated by the tag a, gives the place 
of publication; the next, indicated by the tag b, gives 
the publisher; the third, indicated by the tag c, gives 
the date. 


MARC bibliographic records are distinguished from all 
other types of MARC records by specific codes in Leader/06 
(Type of record) which identifies the following bibliographic 


record types. 


Language (textual) material 
Manuscript language (textual) 
material 

Computer file 

Cartographic material 


Manuscript cartographic material 


Notated music 


Manuscript music 


Nonmusical sound recording 
Musical sound recording 


Projected medium 
Two-dimensional nonprojectable 
graphic 

Three-dimensional artifact or 
natural objects 

Kit 

Mixed material 


Cataloguing Digital Resources : Metadata and Its Creation 611 


A fill character (hexadecimal value ‘7C’), and ASCII as 
a vertical bar (|), may be used in bibliographic records in some 
positions in fields 006, 007 and 008, and subfield $7 of the 
linking entry fields (760-787). A fill character may not be used 
anywhere in the leader, or in tags, indicators, or subfield 
codes. The use of the fill character in records contributed to 
a national database may also be dependent upon the national 
level requirements specified for each data element. The 
presence of a fill character in a bibliographic record indicates 
that the format specifies a code to be used but the creator of 
the record has decided not to attempt to supply a code. 


Besides, the following typographical conventions are 
used: 

0 - The graphic 0 represents the digit zero in tags, fixed- 
position character position citations, and indicator positions. 
This character is distinct from an uppercase letter O used in 
examples or text. 

# - The graphic symbol # is used for a blank (hex 20) in 
coded fields and in other special situations where the 
existence of the character blank might be ambiguous. 


$ - The graphic symbol $ is used for the delimiter (hex 
1F) portion of a subfield code. Within the text, subfield codes 
are referred to as subfield $a, for example. 


/ - Specific character positions of fixed-length data 
elements, such as those in the Leader, Directory, and field 
008, are expressed using a slash and the number of the 
character position, e.g., Leader/06. 

1 - The graphic 1 represents the digit one (hex 31). 
This character must be distinguished from a lowercase roman 
alphabet letter | (el) (hex 6C) and the uppercase alphabetic 
letter | (eye) (hex 49) in examples or text. 


| - The graphic | represents a fill character (hex 7C). 


612 Manual of Digital Libraries 
6.5.2. Dublin Core 


The Dublin Core Metadata Element Set arose from 
discussions at a 1995 workshop sponsored by OCLC and 
the National Center for Supercomputing Applications (NCSA). 
As the workshop was held in Dublin, Ohio, the element set 
was named the Dublin Core. The continuing development of 
the Dublin Core and related specifications is managed by the 
Dublin Core Metadata Initiative (DCMI) (http://dublincore. 
org/). The original objective of the Dublin Core was to define 
a set of elements that could be used by authors to describe 
their own Web resources. Faced with a proliferation of 
electronic resources and the inability of the library profession 
to catalogue all these resources, the goal was to define a few 
elements and some simple rules that could be applied by 
non-cataloguers. 


The original 13 core elements were later increased to 
15: 


Title, subject, description, source, language, relations, 
coverage, creator, publisher, contributor, rights, date, type, 
format, and identifier. 


A detail description about elements is given in Panel 
6.2. 


Panel 6.2. The Dublin Core 


The following fifteen elements constitute the metadata 
set of the Dublin Core. 


All elements are optional, and all can be repeated. 


Title : The name given to the resource by the creator 
or publisher. 


Creator : The person or organization primarily 
responsible for the intellectual content of the resource 
(authors in the case of written documents; artists, 
photographers, or illustrators in the case of visual 


Cataloguing Digital Resources : Metadata and Its Creation 613 
resources). 


Subject : The topic of the resource. Typically, subject 
will be expressed as a keyword or a phrase that 
describes the subject or the content of the resource. 
The use of controlled vocabularies and formal 
classification schemes is encouraged. 


Description : A textual description of the content of 
the resource, including abstracts in the case of 
document-like objects and content descriptions in the 
case of visual resources. 


Publisher : The entity responsible for making the 
resource available in its present form—e.g., a 
publishing house, a university department, or a 
corporate entity. 


Contributor : A person or organization not specified 
in a creator element that has made significant 
intellectual contributions to the resource but whose 
contribution is secondary to any person or organization 
specified in a creator element—e.g., an editor, a 
transcriber, an illustrator. 


Date : A date associated with the creation or availability 
of the resource. 


Type : The category of the resource—e.g., home page, 
novel, poem, working paper, preprint, technical report, 
essay, dictionary. 


Format : The data format of the resource used to 
identify software and possibly hardware that might be 
needed to display or operate the resource. 


Identifier : A string or number used to uniquely identify 
the resource. Examples for networked resources 


include URLs and URNS. 


Source : Information about a second resource from 
which the present resource is derived. 


Language : The language of the intellectual content 


614 Manual of Digital Libraries 


of the resource. 


Relation : An identifier of a second resource and its 
relationship, to the present resource. This element 
permits links between related resources and resource 
descriptions to be indicated. Two examples are an 
edition of a work and a chapter of a book. 


Coverage : Spatial locations and temporal durations 
characteristic of the resource. 


Rights : A rights-management statement, an identifier 
linked to such a statement, or an identifier linked to a 
service providing information about rights 
management for the resource. 


To make this discussion concrete, consider an 
electronic a record created with the relevant portions of the 
Dublin Core, and a sample syntax, that describes an electronic 
version of Maya Angelou’s poem “On the Pulse of Morning”. 
This description is based on a record created by the University 
of Virginia Library’s Electronic Text Center. 


e Subject : Poetry 

e Title : On the Pulse of Morning — 

e Creator : Maya Angelou 

e Publisher : University of Virgina Library Electronic Text 
Center 

e OtherAgent : Transcribed by the University of Virginia 
Electronic Text Center 

e Date: 1993 

e Object : Poem 

e Form: 1 ASCII file 

° Identifier : AngPuls1 


Source : Newspaper stories and oral performance of 
text at the presidential inauguration of Bill Clinton 


Cataloguing Digital Resources : Metadata and Its Creation 615 
e Language: English 


The Dublin Core was developed to be simple and 
concise, and to describe Web-based documents. However. 
Dublin Core has been used with other types of materials and 
in applications demanding some complexity. 


But, there has historically been some tension between 
supporters of a “minimalist” view, who emphasize the need 
to keep the elements to a minimum and the semantics and 
syntax simple, and supporters of a “structuralist” view who 
argue for finer semantic distinctions and more extensibility 
for particular communities. 


These discussions have led to a distinction between 
qualified and unqualified (or simple) Dublin Core. Qualifiers 
can be used to refine an element, or to identify the encoding 
scheme used in representing an element value. The element 
“Date”, for example, can be used with the refinement qualifier 
“created” to narrow the meaning of the element to the date 
the object was created. “Date” can also be used with an 
encoding scheme qualifier to identify the format in which the 
date is recorded, for example, following the ISO 8601 standard 
for representing date and time. 


All Dublin Core elements are optional and all are 
repeatable. The elements may be presented in any order. 
While the Dublin Core description recommends the use of 
controlled values for fields where they are appropriate (i.e., 
controlled vocabularies for the Subject field), this is not 
required. However, working groups have been established 
to discuss authoritative lists for certain elements such as 
Resource Type. 


While Dublin Core leaves content rules to the particular 
implementation, the DCMI encourages the adoption of 
application profiles (domain-specific rules) for particular 
domains such as education and government. An application 


616 Manual of Digital Libraries 


profile for libraries is being developed the Libraries Working 
Group. Due to its simplicity, the Dublin Core is now used by 
many outside the library community — researchers, museum 
curators, and music collectors to name only a few — because 
it does not require knowledge of highly specialized descriptive 
systems like AACR-2. 


There are hundreds of projects worldwide that use the 
Dublin Core either for cataloguing or to collect data from the 
Internet. The subjects range from cultural heritage and art to 
math and physics. 


6.5.3. Global Information Locator Service (GILS) 


GILS is a US Federal Information Processing Standard 
supported by various guidelines and memoranda from the 
Office of Management and Budget (http://www.dtic.mil/gils/ 
documents/naradoc/fip192.html). GILS grew out of the U.S. 
government requirement for public access to government 
information, and it is authorized by the Paperwork Reduction 
Act of 1995. Originally called the “Government Information 
Locator Service”, GILS in various forms has been adopted 
by other governments and for international projects, leading 
to its current designation, “Global Information Locator 
Service”. 


GILS itself does not formally define metadata elements, 
rules for representation, and syntax. Rather, GILS specifies 
a profile of the Z39.50 protocol for search and retrieval, 
specifying which attributes must be supported. The Core GILS 
elements for the U.S. Federal GILS have been defined by 
the National Archives and Records Administration (http:/ / 
www.dtic.mil/gils/ documents/naradoc/). 

The original goal of GILS was to provide high-level 
locator records for government resources, both electronic and 
nonelectronic. GILS records were intended to describe 
aggregates such as catalogs, publishing services and 


Cataloguing Digital Resources : Metadata and Its Creation 617 


databases. The emphasis is on availability and distribution 
rather than on description. Therefore, a GILS record may have 
data elements such as the name and address of the distributor 
and the order process. However, some organizations use 
GILS at the individual item —journal article or technical report, 

level. Since GILS was an early metadata scheme, evaluations 

of its implementation and use are available and very valuable 

in developing other metadata systems. 


6.5.4. Text Encoding Initiative (TE!) Header 


The Text Encoding Initiative (http://www.tei-c.org/) is 
an international project to develop guidelines for marking up 
electronic texts such as novels, plays, and poetry, primarily 
to support research in the humanities. This SGML markup 
becomes part of the electronic resource itself. In addition to 
specifying how to encode the text of a work, the TEI Guidelines 
also specify a header portion, embedded in the resource, that 
consists of metadata about the work. The TEI header, like 
the rest of the TEI, is defined as an SGML DTD, a set of tags 
and rules defined in SGML syntax that describe the structure 
and elements of a type of document. Since the TEI DTD is 
rather large and complicated in order to apply to a vast range 
of texts and uses, a simpler subset of the DTD, known as 
“TEI Lite”, is commonly used in libraries. 


It is assumed that TE! encoded texts are electronic 
versions of printed texts. Therefore the TEI Header can be 
used to record bibliographic information about both the 
electronic version of the text and about the non-electronic 
source version. The basic bibliographic information is not 
dissimilar to that recorded in library cataloguing and can be 
mapped to and from MARC. 

However, there are also elements defined to record 
details about how the text was transcribed and edited, how 
markup was performed, what revisions were made, and other 


618 Manual of Digital Libraries 


non-bibliographic facts. Libraries tend to use TEI headers 
when they have collections of SGML-encoded full text. Some 
libraries use TEI headers to derive MARC records for their 
catalogue systems, while others use MARC records for the 
published source texts as the basis for creating TEI header 
descriptions. 


6.5.5. Encoded Archival Description (EAD) 


In archives and special collections, the finding aid is an 
important tool for resource description. Finding aids differ from 
catalogue records by being much longer, more narrative and 
explanatory, and highly structured in a hierarchical fashion. 
They generally start with a description of the collection as a 
whole, indicating what types of materials it contains and why 
they are important. If the collection consists of the personal 
papers of an individual there can be a lengthy biography of 
that person. The finding aid describes the series into which 
the collection is organized, such as correspondence, business 
records, personal papers, and campaign speeches, and ends 
with an itemization of the contents of the physical boxes and 
folders comprising the collection. 


The Encoded Archival Description (EAD) was 
developed as a way of marking up the data contained in a 
finding aid, so that finding aids can be searched and displayed 
online. The EAD standard is maintained jointly by the Library 
of Congress and the Society of American Archivists (http:// 
www.loc.gov/ead/). Like the TEI Header, the EAD is defined 
as an SGML DTD. It begins with a header section that 
describes the finding aid itself (for example, who wrote it) 
which could be considered metadata about the metadata; it 
then goes on to the description of the collection as a whole 
and successively more detailed information. If individual items 
being described exist-in digital form the EAD can include 

pointers to the digital objects. 


Cataloguing Digital Resources : Metadata and Its Creation 619 
6.6.6. Visual Resources Association (VRA) Core Categories 


The VRA Core Categories is a metadata element set 
developed to describe visual materials such as buildings, 
photographs, paintings and sculptures (http:// 
www.gsd.harvard.edu/~staffaw3/ vra/vracore3.htm). 
Typically, visual resources collections used in teaching art 
history and similar subjects do not contain original art works 
but rather slides or photographs of the original art. Metadata 
for these materials therefore has to accommodate the 
description of multiple levels of related resources; for example, 
an original painting, a slide of the painting, a digitized image 
of the slide. Version 3.0 of the VRA Core Categories consists 
of 17 metadata elements which can be used as applicable to 
describe each of these versions and relate them to each other- 
record type, type, title, measurements, material, technique, 
creator, date, location, ID number, style/period, culture, 
subject, relation, description, source and rights. Like the 
Dublin Core, the VRA Core scheme does not specify any 
particular syntax or rules for representing content. 


6.6.7. ONIX international 


ONIX (Online Information Exchange) International is 
an XML-based metadata scheme developed by publishers 
under the auspices of a number of book industry trade groups 
in the United States and Europe (http://www.editeur.org/ 
onix.html). The original ONIX specification was a direct 
response to the enormous growth in online book sales and 
the realization that books described with images, cover blurbs, 
reviews, and similar information significantly outsold books 
without this information. Therefore ONIX has elements to 
record a wide range of evaluative and promotional information 
as well as basic bibliographic and trade data. Although initially 
focused on the communication of book trade information to 
booksellers and distributors, ONIX is being expanded to 
accommodate other publication types and media, including 


620 Manual of Digital Libraries 


journals and ournal articles, conference proceedings, and 
electronic books. 


Although libraries are not currently creating ONIX format 
data directly, ONIX may play a role in the processing stream 
for Cataloguing in Publication (CIP) and in the creation of 
“provisional” or orderlevel bibliographic records. It is likely that 
additional library uses of ONIX for books and for serials will 
be found as ONIX becomes more pervasive within the 
publishing community. Mappings between ONIX and both 
USMARC and UNIMARC exist and are available from the 
ONIX website. 


6.6.8. CCF 


The Common Communication Format (CCF) was 
developed by the Adhoc Group set-up for the Establishment 
of Common Communication Format in order to facilitate 
exchange of bibliographic data between organizations. The 
first edition of the format was published in 1984, the second 
in 1988 and the third in 1992. The format has been developed 
as an ISO-2709 exchange format and adapted the second 
revised edition of the Anglo-American Cataloguing Rules 
(AACR) as a standard for rendering of information. The CCF 
enables an information provider to have a common format 
into which all data could be converted, and recipients of 
information would need to develop only one conversion 
program for incorporating incoming data from whatever 
source into their information system. In addition, if two or more 
organizations wish to exchange records with one another, it 
will be necessary for each of these organizations to agree 
upon a common standard format for exchange purposes. 
Each must be able to convert to an exchange-format record 

from an internal-format record, and vice versa. If in any 
network of organizations, whether national or international, 
there is a single standard exchange format, information 
interchange within that network will be greatly facilitated, both 


Cataloguing Digital Resources : Metadata and Its Creation 621 


technically and economically. But if each network has a 
different standard format then information interchange 
between different networks and among various bibliographic 
agencies will still be so complex as to be uneconomical, 
because of the number of computer programs that must 
written to accommodate the translation of records from one 
format to another. 


6.6.9. MARCXML 


All MARC-formats use the same technical framework 
format, ISO 2709, an extremely flexible format for the 
wrapping of bibliographic data, The conceptual model is not 
outdated, however, physical structure reflects the age of 
punched cards and magnetic tapes. But now information can 
be exchanged in a more modern XML-wrapping - 
MarcXchange - that builds on the same conceptual model as 
ISO 2709. This initiative also comes from USA where the 
XML-schema MARCXML was developed in 2003. It is closely 
associated with MARC21 and cannot be used for the other 
MARC-variants. The International Standardization Committee 
for Information and Documentation, ISO TC46, therefore, 
decided in May 2003 to prepare a general XML-schema, 
following the same principles as MARCXML, but generalized 
it so as to be able to contain all MARC-formats. 


6.6.10. Metadata Encoding & Transmission Standard 
(METS) 


A newly devised standard, which refines and extends 
the earlier Making of America Il (MOA), system, METS is 
designed specifically to encode descriptive, administrative, 
and structural metadata for objects within a digital library. One 
of the few systems designed specifically for digital libraries, it 
can fulfil all basic requirements of electronic collections, albeit, 
in a rather verbose and clumsy manner. METS has already 
been used by a number of projects, including Harvard’s 


622 Manual of Digital Libraries 


Harvard / Radcliffe Online Historical Reference Shelf, and 
will undoubtedly become a standard for many projects. METS 
is written in XML Schema, a new way of describing XML 
systems, and so requires software that can handle this new 
format. METS depends on a complicated system of cross 
references within documents, and is, therefore, better 
generated automatically, instead of being manually edited. 


6.6.11. Metadata Object Description Schema (MODS) 


The Metadata Object Description Schema (MODS) is 
a descriptive metadata schema that is a derivative of MARC 
21 and intended to either carry selected data from existing 
MARC 21 records or enable the creation of original resource 
description records. It includes as subset of MARC fields and 
uses language-based tags rather that the numeric ones used 
in MARC 21 records. In some cases, regroups elements from 
the MARC 21 bibliographic format like METS, MODS is 
expressed using the XML schema language. Although the 
MODS standard can stand on its own, it may also complement 
other metadata formats. Because of its flexibility and use of 
XML, MODS may potentially be used as a Z39.50 Next 
Generation specified format, an extension schema to METS, 
metadata set for harvesting, and for creating original resource 
metadata records in an XML syntax. Rich description of 
electronic resources is a particular focus of MODS, which 
provides some advantages over other metadata. 


6.7. METADATA FOR DATASETS 


Metadata schemes for datasets are particularly 
significant for libraries that specialize in subjects where 
numeric and statistical data are of great importance. One of 
the most well developed element sets is the Federal 
Geographic Data Committee’s (FGDC) Content Standard for 
Digital Geospatial Metadata (CSDGM), officially known as 
FGDC-STD-001-1998 (http://www.fgdc.gov/metadata/ 


Cataloguing Digital Resources : Metadata and Its Creation 623 


contstan.html). Geospatial datasets include topographic and 
demographic data, GIS (Geographic Information Systems), 
and computer-aided cartography base files. They are used 
in a wide variety of areas, including soil and land use studies 
biodiversity counts, climatology and global change tracking, 
remote sensing and satellite imagery. 


A metadata scheme becoming well established in the 
social and behavioural sciences is the Data Documentation 
Initiative (DPI) standard for describing social science datasets 
(http://www. icpsr.umich.edu/DDI/ codebook.html). The DDI 
is defined as an XML DTD, and allows for top down 
hierarchical description of a social science study, the data 
files resulting from that study, and the variables used in the 
data files. There is also a header area that uses Dublin Core 
elements for a high-level description of the DDI document 
itself. 


6.8. EXTENSIONS AND PROFILES 


Despite the recent development of many of these 
metadata schemes, most have already been subject to the 
changes brought about by implementing them in real world 
situations. These modifications are of two types— extensions 
and profiles. An extension is the addition of elements to an 
already developed scheme to support the description of an 
information resource of a particular type or subject or to meet 
the needs of a particular interest group. Extensions increase 
the number of elements. 


Profiles are subsets of a scheme that are implemented 
by a particular interest group. The profiles can constrain the 
number of elements that will be used, refine element 
definitions to describe the specific types of resources more 
accurately and specify values that an element can take. 


In practice, many applications use both extensions and 
profiles of base metadata schemes. For example, the National 


624 Manual of Digital Libraries 


Biological Information Infrastructure (NBII), with support from 
the Biological Resources Division of the U.S. Geological 
Survey, has developed a biological profile to the FGDC 
Content Standard for use with biological information resources 
(http://www.nbii.gov/datainfo/metadata/standards/ 
index.html). The effort began by extending the elemenis to 
add mportant elements for the description of biological 
resources, such as the taxonomic name of the organism and 
its classification in the taxonomic hierarchy. After the 
additional elements were agreed to, the group recommended 
a specific subset or profile that would be most useful to 
biologists. 


The U.S. Department of Education’s Gateway to 
Educational Materials (GEM) project has similarly based their 
own metadata scheme on the Dublin Core (htip:// 
www.geminfo.org/ Workbench/ Metadata/index.himl). The 
GEM profile limits elements to be used (for example, 
Contributor is not allowed) and makes some elemenis 
mandatory. GEM also defines additional elements such as 
Audience, Grade, Quality and Standards, extending the base 
Dublin Core set for educational use. 


6.9. FRAMEWORKS FOR INTEROPERABILITY AND 
EXCHANGE 


Do we need so many metadata standards and 
initiatives? Can not one standard serve the purpose? Are 
extensions and profiles really necessary? It is important to 
remember that different schemes serve distinct needs and 
audiences. Complementary schemes can be used to describe 
the same resource for multiple purposes serving a number 
of user groups. For example, a technical report could have a 
MARC metadata set in a library’s online catalogue, an FGDC 
description as part of the National Spatial Data Infrastructure 
Clearinghouse Mechanism, and an embedded set of Dublin 


Cataloguing Digital Resources : Metadata and Its Creation 625 


Core elements. Practical aspects of this complex environment 


are being investigated by several groups. 


The SCHEMAS project of the UK Office for Library and 
Information Networking (UKOLN) is a forum for the 
implementers of metadata schemes (http://www.ukoln.ac.uk/ 
metadata/schemas/) that aims to provide information about 
new and emerging metadata standards and to promote “good- 
practice guidelines for adapting multiple standards or 
metadata modules for local use in customized schemas”. 


6.9.1. Resource Description Framework 


The Resource Description Framework (RDF), 
developed by the World Wide Web Consortium (W3C), is a 
data model for the description of resources on the Web that 
also provides a mechanism for integrating multiple metadata 
schemes (http://www.w3.org/RDEF/). 

In RDF, aname space is defined by a URL pointing to 
a Web resource that describes the metadata scheme that is 
used in the description. Multiple name spaces can be defined, 
allowing elements from different schemes to be combined in 
a single resource description. Multiple descriptions, created 
at different times for different purposes, can also be linked to 
each other. RDF is generally expressed in XML (sidebar). 
Another project related to the interoperability of metadata is 
OCLC’s Cooperative Online Resource Catalog (CORC) 
project (http://www.oclc.org/core/). 

The Interoperability of Data in Ecommerce Systems 
(INDECS) Framework (http:// www.indecs.org) was an 
international collaborative effort supported by the European 
Commission's info2000 Programme. The collaborators were 
major rights owners, such as publishers and members of the 
recording industry, who wanted to develop a framework for 
metadata standards to support network commerce in 

intellectual property. 


626 Manual of Digital Libraries 


The foundation of the INDECS work is a data model for 
Intellectual Property and its transfer. Rather than developing 
a new metadata scheme, INDECS sought to develop a 
common framework to allow various schemes for transactions 
related to different genres such as music, journal articles, 
and books to be able to interchange information, particularly 
that related to intellectual property rights. In order to support 
this common framework, this has developed a minimal kernel 
of required metadata. Several pilot projects are under way to 
validate the metadata kernel. The framework and other 
metadata are being implemented in several major projects 
involving books and audiovisual materials. 


6.9.2. Basic RDF Model 


The RDF model draws on well-established principles 
from various data representation communities. RDF 
properties may be thought of as attributes of resources and 
in this sense correspond to traditional attribute - value pairs. 
RDF properties also represent relationships between 
resources and an RDF model can therefore resemble an 
entity-relationship diagram. In object-oriented design 
terminology, resources correspond to objects and properties 
correspond to instance variables. 


The RDF data model is a syntax-neutral way of 
representing RDF expressions. The data model 
representation is used to evaluate equivalence in meaning. 
Two RDF expressions are equivalent if and only if their data 
model representations are the same. This definition of 
equivalence permits some syntactic variation in expression 
without.altering the meaning. 


The basic data model consists of three object types: 


Resources : All things being described by RDF 
expressions are called resources. A resource may, be an 
entire Web page; such as the HTML document http:// 


Cataloguing Digital Resources : Metadata and lts Creation 627 


www.w3.org/Overview.html” for example. A resource may be 
a part of a Web page, e.g., a specific HTML or XML element 
within the document source. A resource may also be a whole 
collection of pages, e.g., an entire Web site. A resource may 
also be an object that is not directly accessible via the Web, 

e.g., a printed book. Resources are always named by URIs 

plus optional anchor ids. Anything can have a URI, the 

extensibility of which allows the introduction of identifiers for 

any entity imaginable. 

Properties : A property is a specific aspect, 
characteristic, attribute, or relation used to describe a 
resource. Each property has a specific meaning, defines its 
permitted values, the types of resources it can describe, and 
its relationship with other properties. This document does not 
address how the characteristics of properties are expressed. 

Statements : A specific resource together with a named 
property plus the value of that property for that resource is an 
RDF statement. These three individual parts of a statement 
are called, respectively, the subject, the predicate, and the 
object. The object of a statement (i.e., the property value) 
can be another resource or it can be a literal; i.e., a resource 
(specified by a URI) or a simple string or other primitive 
datatype defined by XML. In RDF terms, a /itera/may have 
content that is XML markup but is not further evaluated by 
the RDF processor. There are some syntactic restrictions on 
how markup in literals may be expressed. 


Resources are identified by a resource identifier. A 
resource identifier is a URI plus an optional anchor id. For 
the purposes of this section, properties will be referred to by 
a simple name. 

Consider a simple example sentence: 

Ora Lassila is the creator of the resource http:// 
www.w3.org/Home/Lassila. This sentence has the following 
Parts: 


628 Manual of Digital Libraries 


Subject (Resource) http://www.w3.org/Home/Lassila 
Predicate (Property) Creator 
Object (literal) “Ora Lassila” 


The RDF statement can be represented pictorially (Fig. 
6.1) using directed labeled graphs. These are also called 
“nodes and arcs diagrams’. In these diagrams, the nodes 
(drawn as ovals) represent resources and arcs represent 
named properties. Nodes that represent string literals will be 
drawn as rectangles. The sentence above would thus be 


diagrammed as: 
Fig. 6.1. Simple Node and Arc Diagram 


The direction of the arrow is important. The arc always 
starts at the subject and points to the object of the statement. 
The simple diagram above may also be read “http:// 
www.w3.org/Home/Lassila has creator Ora Lassila”, or in 
general “<subject> HAS <predicate> <object>”. 

Now, consider the case that we want to say something 
more about the characteristics of the creator of this resource. 
In prose, such a sentence would be: 

The individual whose name is Ora Lassila, email 

</assila@w3.org>, is the creator of http://www. w3.org/Home/ 
Lassila. 

The intention of this sentence is to make the value of 
the Creator property a structured entity. In RDF such an entity 
is represented as another resource. The sentence above does 
not give a name to that resource, it is anonymous, so in the 
Fig. 6.2, we represent it with an empty oval: 


Creator 
http Awww. w3.orgHome/Lassila 


Cataloguing Digital Resources : Metadata and Its Creation 629 


hilpuMaww.w3.orgsHome/Lassila 


Name 


Ora Lassila 


Fig. 6.2. Property with Structured Value 


fassila@w3.org | 


Corresponding to the reading in the previous note, this 
diagram could be read “Attp:/www.w3.org/Home/Lassila has 
creatorsomething andsomething has name Ora Lassila and 
email lassila@ws3.org”. 


The structured entity of the previous example can also 
be assigned a unique identifier. The choice of identifier is 
made by the application database designer. To continue the 
example, imagine that an employee id is used as the unique 
identifier for a “person” resource. The URIs that serve as the 
unique keys for each employee might then be something like 
-http://www.w3.org/staffld/85740. Now we can write the two 
sentences: 


The individual referred to by employee id 85740 is 
named Ora Lassila and has the email address 
lassila@w3.org. The resource http://www. w3.org/Home/ 
Lassila was created by this individual. 


The RDF model for these sentences is: 


630 Manual of Digital Libraries 


an 


r 


K httpufwww.vy¥3. orgfHome/Lassila S 


E 


Creator 


oe 


hitp/www.w3 oro/stafa/857 40 pi. 


lassila@w3.org 


Fig. 6.3. Structured Value with Identifier 


Note that this diagram (Fig. 6.3) is identical to the 
previous one with the addition of the URI for the previously 
anonymous resource. From the point of view of a second 
application querying this model, there is no distinction 
between the statements made in a single sentence and the 
statements made in separate sentences. Some applications 
will need to be able to make such a distinction however, and 
RDF supports this. 


The RDF data model provides an abstract, conceptual 
framework for defining and using metadata. A concrete syntax 
is also needed for the purposes of creating and exchanging 
this metadata. This specification of RDF uses the Extensible 
Markup Language [XML] encoding as its interchange syntax. 
RDF also requires the XML namespace facility to precisely 
associate each property with the schema that defines the 

property. 


The syntax for the sentence “Ora Lassila is the creator 
of the resource http:// www.w3.org/Home/Lassila. is 


Cataloguing Digital Resources : Metadata and Its Creation 631 
represented in RDF/XML as: 
<rdf RDF> 


<rdf : Description about="http://www.w3.org/Home/ 
Lassila”> 


<s:Creator>Ora Lassila</s:Creator> 
</rdf: Description> 
</rdf; RDF> 


The complete XML document containing the description 
above would be: 


<?xml version="1.0"?> 

<rdf:RDF 
xmins:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#” 
xmins:s="http://description.org/schema/"> 
<rdf:Description about="http://w<ww.w3.org/Home/ 
Lassila”> 

<s:Creator>Ora Lassila</s:Creator> 

</rdf.Description> 


</rdf:RDF> 


6.9.3. Warwick Framework 


The Warwick Framework is a container architecture 
formed as a result of the Warwick workshop. The framework 
is a mechanism for aggregating logically, and perhaps 
physically, distinct packages of metadata. This is a 
modularization of the metadata issue with a number of notable 
characteristics. 

e _|t allows the designers of individual metadata sets to 


focus on their specific requirements, without concerns 
for generalization to ultimately unbounded scope. 


e It allows the syntax of metadata sets to vary in 
conformance with semantic requirements, community 


632 Manual of Digital Libraries 


practices, and functional requirements for the kind of 
metadata in question. 


e It separates management of and responsibility for 
specific metadata sets among their respective 
“communities of expertize”. 


e  |tpromotes interoperability by allowing tools and agents 
to selectively access and manipulate individual 
packages and ignore others. 


e |t permits access to the different metadata sets that 
are related to the same object to be separately 
controlled. 


e It flexibly accommodates future metadata sets by not 
requiring changes to existing sets or the programs that 
make use of them. 


The separation of metadata sets into packages does 
not imply that packages are completely semantically distinct. 
In fact, it is a feature of the Warwick Framework that an 
individual container may hold packages, each managed and 
maintained by distinct parties, which have complex semantic 
overlap. 


The Warwick Framework has two fundamental 
components. A container that is the unit for aggregating the 
typed data and metadata sets, which are known as packages. 


A container may be either transient or persistent. In its 
transient form, it exists as a transport object between and 
among repositories, clients, and agents. In its persistent form, 
it exists as a first-class object in the information infrastructure. 
That is, it is stored on one or more servers and is accessible 
from these servers using a globally accessible identifier (URI). 
We note that a container may also be wrapped within another 
object /e., one that is a wrapper for both data and metadata. 
In this case the “wrapper” object will have a URI rather than 

the metadata container itself. 


Cataloguing Digital Resources : Metadata and Its Creation 633 


Independent of the implementation, the only operation 
defined for a container is one that returns a sequence of 
packages in the container. There is no provision in this 
operation for ordering the members of this sequence and thus 
no way for a client to assume that one package is more 
significant or “better” than another. At the container level, each 
package is an bit stream. One implication of these properties 
is that any encoding for a container must allow the recipient 
of the container to skip over unknown packages within the 
container. 


Each package is a typed object and its type may be 
inferred after access by a client or agent. The packages are 
of three types: 


Metadata Set : These are packages that contain actual 
metadata. Some examples of this are packages that are 
MARC records, Dublin Core records, and encoded terms and 
conditions. A potential problem is the ability of clients and 
agents to recognize and process the semantics of the many 
metadata sets. In addition, clients and agents will need to 
adapt to new metadata types as they are introduced. Initial 
implementations of the Warwick Framework will probably 
include a set of well known metadata sets, in the same manner 
that most Web browsers have native handlers for a set of 
well-known MIME types. Extending the Framework 
implementations to handle an extensible metadata sets will 
rely on a type registry scheme. 


Indirect : This is a package that is an indirect reference 
to another object in the information infrastructure. While the 
indirection could be done using URLs, we emphasize that 
the existence of a reliable URL implementation is a necessary 
to avoid the problems of dangling references that plague the 
Web. We note three possibly obvious, but important, points 
about this indirection. First, the target of the indirect package 
is a first-class object, and thus may have its own metadata 


634 Manual of Digital Libraries 


and, significantly, its own terms and conditions for access. 
Second, the target of the indirect package may also be 
indirectly referenced by other containers. Finally, the target 
of the indirection may be in a different repository or server 
than the container that references it. 


Container : This is a package that is itself a container. 
There is no defined limit for this recursion. 


The figure 6.4 below shows a simple example of a 
Warwick Framework container. The container in this example 
contains three logical packages of metadata. The first two, a 
Dublin Core record and a MARC record, are contained within 
the container as a pair of packages . The third metadata set, 
which defines the terms and conditions for access to the 
content object, is referenced indirectly via a URI in the 
container. 


Container 


Package 
Dublin Core 


Package 
MARC Record 


Package Package 
Indirect Terms and conditions 


Fig. 6.4. Warwick Framework Container 
The mechanisms for associating a Warwick Framework 
container with a content object (/,e., a document) depend on 
the implementation of the Framework. 


The reverse linkage, that ties a container to a piece of 
intellectual content, is also relevant. Anyone can, In fact, 


Cataloguing Digital Resources : Metadata and Its Creation 635 


create descriptive data for a networked resource, without 
permission or knowledge of the owner or manager of that 
resource. This metadata is fundamentally different from that 
metadata that the owner of a resource chooses to link or 
embed with the resource. We, therefore, informally distinguish 
between two categories of metadata containers, which both 
have the same implementation. 


An internally-referenced metadata container is the 
metadata that the author or maintainer of a content object 
has selected as the preferred description(s) for the object. 
This metadata is associated with the content by either 
embedding it as part of the structure that holds the content or 
referencing it via a URI. An internally-referenced metadata 
container referenced via a URI is, by nature, a first-class 
networked object, and may have its own metadata container 
associated with it. In addition, an internally-referenced 
metadata container may back-reference the content that it 
describes via a /inkage metadata element within the container. 


An externally - referenced metadata container is 
metadata that has been created and is maintained by an 
authority separate from the creator or maintainer of the 
content object. In fact, the creator of the object may not even 
be aware of this metadata. There may an unlimited number 
of such externally-referenced metadata containers. For 
example, libraries, indexing services, ratings services, and 
the like may compose sets of metadata for content objects 
that exist on the net. As stated earlier, these externally- 
referenced metadata containers are themselves first - class 
network objects, accessible through a URI and having some 
associated metadata. The linkage to the content that one of 
these externally - referenced containers purports to describe 
will be via a /inkage metadata element within the container. 
There is no requirement, nor is it expected, that the content 
object will reference these externally-referenced containers 


in any way. 


636 Manual of Digital Libraries 


The Fig. 6.5 shows an example of this relationship. 
Three metadata containers are shown. The one internally- 
referenced metadata container is embedded in the content 
object — it does not have a URI, nor does it have a linkage 
package that references the content. The two externally- 
referenced metadata containers are independent objects. 
They each have a URI and reference the content object via 
its URI. 


The internally - referenced metadata container in this 
illustration could also be indirectly referenced by the content. 
In this case it would have its own URI (say URI4) and would 
have a linkage package referencing URI3 - the content. 


There are some open issues in the Warwick 
Framework. Time at the Warwick workshop did not permit a 
full exploration of all the issues involved in the proposed 
framework. There are several topics that urgently call for more 
detailed and extended examination prior to finalizing the 
framework. Some of the issues are discussed here. 


URI, 


extemally-wferenced 


extamally-2 fererced | 
metadata metadata 
cortairer cortainer 


Fig. 6.5. Metadata Relationship 


Cataloguing Digital Resources : Metadata and Its Creation 637 


Semantic Interaction of Overlapping Sets : Certainly 
the most fundamental question about the Warwick Framework 
is the semantic interaction and overlap of the multiple 
metadata sets that may exist in a container. While packages 
are to some extent logically distinct, they may have semantics 
that overlap in complex ways. For example, a container may 
contain two descriptive cataloguing metadata packages—one 
MARC and the other Dublin Core. A more complex example 
is a container that contains multiple terms and conditions 
metadata sets at different levels of recursion in a container. 
In the end, the semantics of the metadata associated with an 
object need to be understood by the “consumers” of the 
metadata - the clients and agents that access objects and 
the users that configure these clients and agents. We run the 
danger, with the full expressiveness of the Warwick 
Framework, of creating such complexity that the metadata is 
effectively useless. Finding the appropriate balance is a 
central design problem. 


Type Registry : The Framework design requires that 
packages are strongly typed. An agent or client will be able 
to determine the type of the metadata in a package; so 
definers of specific metadata sets should ensure that the set 
of operations and semantics of those operations will be strictly 
defined for a package of a given type. It is expected that a 
limited set of metadata types will be widely used and 
“understood” by browsers and agents. However, the type 
system must be extensible, and some method that allows 
existing clients and agents to process new types must be a 
part of a full implementation of the Framework. 


Data Encoding : The Warwick Framework presents 
two data encoding problems. At the container level, what is 
the syntax for transferring sets of packages? This syntax must 
be independent from the syntax of the packages themselves, 
which are opaque at this level. The more difficult data 


638 Manual of Digital Libraries 


encoding problems exist at the package level. Some metadata 
sets can be adequately expressed in ASCII, as a set of 
attribute/value pairs. Others require more expressive syntax; 
for example, rules that describe the terms and conditions for 
access to an object are best expressed via some type of 
executable program or agent. There is a need to agree on 
one or more syntaxes for the various metadata sets. 


Efficiency : The power of the Warwick Framework lies 
in its recursive and distributed characteristics. This lends great 
power to the model, but in an actual implementation may be 
quite inefficient. Even in the context of the relatively simple 
World Wide Web, the Internet is often unbearably slow and 
unreliable. Connections often fail or time out due to high load, 
server failure, and the like. In a full implementation of the 
Warwick Framework, access to a "document” might require 
negotiation across distributed repositories. The performance 
of this distributed architecture is difficult to, predict and is prone 
to multiple points of failure. Efficient operation of this 
distributed architecture will depend an improved network 
infrastructure using caching, data or object replication, 
dynamic load balancing, and other methods being examined 
in distributed systems research. 


Repository Access : It is clear that some protocol work 
will need to be done to support container and package 
interchange and retrieval. We foresee the need for various 
forms of retrieval. The simple form is retrieval of a container 
for an object. A more complex form is retrieval of only those 
containers that include packages of a specific set of types. 
The requirements for this protocol have not been explored in 
any detail. Some examination of the relationship between the 
Warwick Framework and ongoing work in repository 
architectures would likely be fruitful. 


Cataloguing Digital Resources : Metadata and Its Creation 639 
6.10. METADATA CROSSWALKS 


The interoperability and exchange of metadata is 
facilitated by metadata crosswalks. A crosswalk is a mapping 
of the elements, semantics and syntax from one metada to 
scheme to those of another. A crosswalk allows metadata 
created by one community to be used by another group that 
employs a different metadata standard. The degree to which 
these crosswalks are successful at the individual record level 
depends on the similarity of the two schemes, the granularity 
of the elements in the target scheme compared to that of the 
source, and the compatibility of the content rules used to fill 
the elements of each scheme. 


A crosswalk is a set of transformations applied to the 
content of elements in a source metadata standard that result 
in the storage of appropriately modified content in the 
analogous elements of a target metadata standard. A 
complete or fully specified crosswalk takes care of semantic 
mapping as well as metadata conversion. The metadata 
conversion specifications contain instructions for transferring 
content of a field / sub-field defined for source metadata 
standard into corresponding fields / subfields of the target 
metadata standard. 


A metadata crosswalk is a software tool that 
incorporates specifications for mapping one metadata 
standard to another. It enables transfer of content of fields / 
sub-fields defined in one metadata standard into 
corresponding fields / sub-fields of another metadata 
standard. Development of crosswalk utility, therefore, requires 
in-depth knowledge and expertize in the associated metadata 
standards. Given the fact that metadata standards themselves 
are often developed independently and are targeted to a 
specified community of users, developing expertize in different 
metadata standards and developing crosswalks for them to 
a challenging task. Moreover, maintaining crosswalks as the 


640 Manual of Digital Libraries 


metadata standards change over a period of time becomes 
even more problematic due to the need to sustain a historical 
perspective and ongoing expertise in the associated 
standards. 


Crosswalks are important for virtual libraries where 
resources are being collected from a variety of sources and 
are expected to act as a whole, perhaps with a single search 
engine applied. While these crosswalks are key, they are also 
labour intensive to develop and maintain. But the mapping of 
schemes with fewer elements (less granularity) to those with 
more elements (more granularity) is problematic. 


6.11. METADATA REGISTRIES 


Registries are an important tool for managing metadata. 
Metadata registries can provide information on the definition, 
origin, source, and location of data. Registration can apply at 
many levels, including schemes, usage profiles, metadata 
elements, and code lists for element values. Registries can 
document the meaning and use of the elements in a single 
metadata scheme as they change over time, or the way the 
same elements have been used in different applications. 
Registries can also document element meanings in multiple 
schemes or databases, particularly within a specific field of 
interest such as health care, aeronautics, or environmental 
science. 


A good example is the U.S. Environmental Protection 
Agency, Environmental Data Registry (http:// www.epa.gov/ 
edr/) that provides information about thousands of data 
elements used in current and legacy EPA databases. The 
metadata registry provides an integrating resource for legacy 
data, acts as a lookup tool for designers of new databases, 
and documents each data element. Standards relevant to 
metadata registries include ISO/IEG 11179 Specification and 
Standardization of Data Elements (a joint standard of the 


Cataloguing Digital Resources : Metadata and Its Creation 641 


International Organization for Standardisation and the 
International Electrotechnical Commission) and the ANSI 
X3.285, Metamodel for the Management of Shareable Data. 


6.12. METADATA CREATION 


Who creates metadata? The answer to this varies by 
discipline, the resource being described, the tools available, 
and the expected outcome, but it is almost always a 
cooperative effort. Much basic structural and administrative 
metadata is supplied by the technical staff who initially digitize 
or otherwise create the digital object. 


A meta resource consists of information resources on 
the given topics that are organized to facilitate selective 
access to qualitative and authoritative information. A meta 
resource consisting only of links to the information resources 
defeats the purpose of its existence. The context in which an 
information resource is included is extremely important. Just 
like a bibliographic record in a catalogue contains a few 
essential elements that describe the book (author, title, 
publisher, etc.), each information resource included in a meta 
resource should contain at least a few elements to adequately 
describe it. The eiements used for describing an Internet 
information resource are as follows: 


— Title/Name of the Resource, 

— URL of Resource, 

— Source / Author / Creator, 

— Authority of Source, 

— Publisher, 

— Last Update / Update Frequency, 

— Expiration Date (if applicable), 

— Size / Volume, especially for software, sound, image 


642 Manual of Digital Libraries 


and movie files, 


— Intended Audience, if the resource is appropriate for 
students, faculty, staff, children or any other audience, 


— Subject and Keywords, and 


— Description for the scope, aims, and goals of the 
resource. 


After defining above elements, a user needs to connect 
with internet via a service provider. The meta resources or 
portals are an increasingly common sight on the web. For 
users who are connected to the Internet via a service provider 
such as MSN and AOL in USA or Mantra Online or Sigma 
Online in India, their service provider’s portal is likely to be 
the first thing to greet them on the web, and there is a real 
effort on the part of the service provider to make their portals 
sufficiently attractive for the user to use the services and 
content provided in or through the portal, or provided by one 
of their partners. Besides commercial portals mentioned 
above, the concept of portals or meta resources have found 
their applications in different fields and at various levels. Meta 
resources may be categorized under the following three 
categories: 


Commercially-financed Portals : These are available 
freely, with an aim to hook a user to the site by providing 
access to a range of useful proprietary services (e- mail, etc.) 
and some prescribed content (news feeds, reference material, 
etc.). Examples of commercially financed portals are Microsoft 
Network (MSN), Yahoo!, Excite, Netscape, Lycos, CNET, and 
America Online’s AOL.com. Subject-specific meta-resources 
may be backed-up by commercial publishers (Elsevier's 
BioMedNet), and scholarly societies (CAS’s ChemWeb). 


Institutional Portals : These are principally focused 
upon providing access to local functions and services, but 
also capable of pointing to external content. Institutional 


Cataloguing Digital Resources : Metadata and Its Creation 643 


portals seek to fulfil a range of functions, providing news feeds 
of interest to the institution and linking to management 
systems in order to track fees, payments, etc., and integrating 
to a degree with back-end databases handling time-tabling, 
room bookings and the like in order to remind students where 
they need to be and when. Several libraries have established 
one or more meta resources to provide access to electronic 
resources via the Internet through the library’s website. 
Several meta resources are format-specific and provide 
access to free or fee-based resources of different intellectual 
formats including e-encyclopedia, e-dictionaries, e-books, e- 
journals, software, online electronic databases, sounds, 
images, etc. 


Virtual Learning Environments (VLEs) and Managed 
Learning Environments (MLEs) : These refer to the 
components in which learners and tutors participate in “on- 
line” interactions of various kinds, including on-line learning. 
Virtual Learning Environments are increasingly becoming an 
important part of the educational system for delivering online 
and flexible learning. IGNOU programmes on BIT, ADIT are 
examples of this category. 


A meta resource site may be subject-specificor format- 
specific. Several subject-specific or format-specific meta 
resources that are available on the web offer a single entry 
point to resources in a given topic or set of topics. A meta 
resource may be “Umbrella” type, i.e., consisting of single 
collection that provides access to all kinds of information 
resources or it can be segmented, customized or re-purposed 
to serve different patrons differently, each of which may vary 
in focus and / or presentation. The meta data that describes 
the individual collections in the meta resources may be stored 
locally or remotely, it may be published by a third party or be 
custom-generated by the library. 


644 Manual of Digital Libraries 


6.13. METHODOLOGY FOR BUILDING UP META 
RESOURCES 


The process of building-up a meta-resource site 
requires distinct involvement of a subject expert and an 
information specialist. It is desirable that a subject expert 
describes the information resources for a meta resource site 
to reflect it most aptly in representation systems such as 
contents and indexes. However, it would require the services 
of an information professional to organize the resources since 
it is an information specialist who is well-versed with the 
intricacies involved in the design and development of 
information retrieval systems, be it for achieving 
standardization through vocabulary control or for deciding the 
coordination of terms in the subject headings which support 
searches, etc. 


A meta resource should have a well-defined policy for 
selection, evaluation and description of information resources 
available through it. It should have a consistent collection 
development policy. The information resources should meet 
the user’s needs and fit the mission of the meta resource 
developer. The information resource should be authoritative 
and should cover the desired depth and breath of the subject 
area. The steps involved in the process of building-up a meta 
resource include: 


6.13.1. Study of Subjects 


The information professional involved in the process of 
building-up a meta-resource, needs to have functional 
knowledge of the subject, its structure, terms and available 
resources in the area. A systematic way to get a functional 
knowledge of a subject is to conduct a study of the subject. 
The steps involved in it include taking stock of the historical 
background of the given subject and terminological 
development. The structure of the subject can be determined 


Cataloguing Digital Resources : Metadata and Its Creation 645 


by the way the subject has been treated in different 
classification systems. It also involves study of available 
sources on the subject and their classification. This is essential 
in organizing the reference material, articles and other sources 
in a useful order. At the end of the exercise, an information 
specialist will have essential insight and enough knowledge 
of a subject to be able to proceed with organization of 
information resources. 


6.13.2. Identification, Selection and Evaluation of 
information Resources 


Identification of information resources is the first step 
in developing a meta resource. Search engines, mailing lists, 
Usenet User Groups, directories and other meta resources 
sites may be used for identifying information resources that 
may be considered for selection for a meta resource. 
Newspapers, magazines and scholarly journals may also be 
used for identification of information resources for a meta 
resource site. Free e-mail subscription to Newsletters like 
Virtual Acquisition Shelf and News Desk (http:// 
www.resourceshelf.blogspot.com/), What's New on Academic 
Info (Attp./www.academicinto.netinew.html, LII New This 
Week (http://www.lii.org/search/ntw) can also be availed to 
get links to new resources. 


Selective information resources are much easier to 
maintain, and provide more value to users than those that 
are less discriminating. The se/ection of an information 
resource for a meta resource site would be based on the 
process of evaluation described in the next step. 


Internet resources differ from print-based resources 
largely in their presentation and way they can be accessed. 
However, the. criteria employed for eva/uation of information 
resources are not different from those used for printed- 
resources. 


646 Manual of Digital Libraries 
6.14. ORGANIZATION OF META RESOURCES 


Information resources need to be presented in an 
organized and structured fashion so as to add value to the 
resources included in the meta resource. Even the most basic 
set of meta-resources have an organization system 
(alphabetical), a navigation system (imbedded on the meta 
site or using browser’s features such as “Back” and 
“Forward”), and a labelling system comprising name of 
resources and names of meta sites. In addition, a meta- 
resource site should also have a feedback mechanism to 
ensure that it responds to the user’s needs. Providing an 
organizational structure to a meta resource involves the 
following: 


6.14.1. Using Classification / Controlled Vocabulary 


The purpose of designing a meta resource is to retrieve 
precise information of high relevance. The general-purpose 
automated Internet search engines fail to achieve this goal in 
the process of searching a vast number of heterogeneous 
documents by mechanically matching words in the http 
documents. Further, the Internet search engines do not have 
any semantic context to the terms matched. The meta 
resources stand apart as a dataset of organized and 
structured information resources where the general-purpose 
Internet search engines fail. The traditional tools of 
Organization such as library classification system are very 
useful for subject-guided searches. The classification number 
allows a subject tree to be automatically created allowing the 
resources to be efficiently browsed. Standard classification 
schemes, such as the Universal Decimal Classification (UDC) 

scheme can be used to impose a structure on a meta 
resource. The use of familiar classification schemes allows 
users to quickly find exactly the resources that will interest 
them. The publisher of a meta resource can determine which 


Cataloguing Digital Resources : Metadata and Its Creation 647 


classification scheme to use for providing an organizational 
structure to the resources. 


Most meta resources have developed their own subject 
scheme while others use existing subject headings, thesauri 
or a controlled vocabulary. The access to information 
resources on a meta resource site can also be provided using 
subject headings only. Access points for an information 
resource may also include author, title and format — e- 
directory, e-journal, e-book, software, e-encyclopedia, etc. 
Access may also be provided for a particular audience such 
as children, faculty, staff, people with disabilities, etc. A meta 
resource site may provide a number of simultaneous 
organizational schemes offering the users the chain to define 
their own options through a pull-down selection menu and 
clickable selection boxes. 


6.14.2. Back-end Database Support for a Meta Resource 


A professionally designed meta resource site would 
require that metadata about information resources in it, that 
are maintained in a database with pre-defined data structure. 
A meta resource site with a back-end database provides a 
strong organizational benefit to the site. The web-enabled 
database would allow a user to conduct searches on the 
information resources available on the site and generate its 
portions on-the-fly. In addition to a user interface, the back- 
end database for a meta resource site may also have a 
administrative interface, available via the web, which would 
enable the staff to add, modify, delete records and assign 
keywords using a controlled vocabulary. A formal database 
ensures consistency among records and makes it easier to 
maintain the collection over time. The database-driven meta 
resource allows to re-purpose and re-organize the information 
resources contained in it easily, allowing the meta resource 
administrator to publish versions of the same data in different 


648 Manual of Digital Libraries 


ways or on different sites or publish versions on non-web 
formats. The software for a meta resource site would consist 
of the following components, i.e.: 


Database Containing Meta Data on Information 
Resources : The database containing meta data about 
Internet information resources can be designed using any 
standard database management system like Microsoft 
Access, Oracle, MySQL or MS SQL Server, PosigreSQL, 
etc. This database would contain details on Internet 
information resources along with its description and a link to 
the resourcés. 


Web Interface to the Database : The ODBC (Open 
Database Connectivity) drivers for most of the important 
databases are in-built into the operating system. However, 
Common Gateway Script (CGI) programming, often written 
in PERL, are used to write such interfaces, in case the ODBC 
is not available. Moreover, scripting languages are used io 
develop complex web applications even if ODBC is available 
for a database. However, common tasks can be accomplished 
using Active Server Pages (ASP) Technology and the 
VBScript. ASP technology requires Windows NT’s Internet 
Information Server. Unix and complaint web servers can use 
ASP with add-on available from Chili!Soft or other vendors. 
Java applications called Java applets that run on a client 
machine and serviets that run on a server, are also used for 
developing web interface to a database. 


Browsing and Search Software : ASP technology using 
VBScript may be used to provide a user-friendly browsing 
and search interface. On conducting a search, the interface 
would query the database and derive the data from the back- 
end database containing the meta data on Internet information 
resources and present it to the user on-the-fly. 


Site Administration and Maintenance : Web-based 
interfaces need to be developed to facilitate site 


Cataloguing Digital Resources : Metadata and Its Creation 649 


administration, maintenance and updation of database and 
the meta resource. A web-based interface would also be 
required for data entry of surrogate records from multiple 
locations. Administrative interfaces are also required to 
generate administrative reports and statistics in various 
formats. 


6.14.3. Navigation System 


A meta resource site, as for any other website, would 
require a navigation system that ensures that users can move 
efficiently between and amongst the major areas and 
hierarchical levels of the sites. A meta resource site should 
have a local navigation system that provides access to its 
other parts. 


6.14.4. Labeling System 


Labels are the terms that describe an entity on a meta 
resource. A labeling system communicates information 
without taking up too much of vertical space or a user’s 
cognitive space. In a meta resource site, the labeling system 
reflects the organization and navigation systems of the 
collection and describes the information resources 
themselves. A label should precisely define the entity in the 
fewest words possible and as clearly as possible. Labels 
should use language appropriate for the collection’s audience. 
Standard subject headings like Library of Congress Subject 
Headings (LCSH), Medical Subject Headings (MESH), 
Subject Headings in Engineering (SHE) may be considered 
for meta resources intended for scholars, scientists and 
technologists, while it would be better to use normal speech 
for children, students and lay persons. 


6.14.5. Feedback 


To keep the meta resource site useful and relevant to 
the user, itis important that the developer of a meta resource 


650 Manual! of Digital Libraries 


analyzes its usage and incorporates feedback from the users. 
The meta resource developer should deploy tools and 
techniques to learn how a user enters the collection and pages 
and resources that are used most or least. The meta resource 
should allow users to provide feedback about the services 
and report errors such as broken links. A form for users to 
suggest new resources may also be incorporated. 


The researcher may create a skeleton, completing the 
elements that can be supplied most readily. Then results may 
be supplemented or reviewed by the cataloguer or 
consistency. Two major projects are providing metadata tools 
and services for the Dublin Core are the Nordic Web Project 
and MetaWeb in Australia. The Nordic Web provides 
metadata creation software and Dublin Core to MARC 
conversion software which is free within the European Union. 
MetaWeb has developed a metadata editor called “Reggie.” 


There are a number of FGDC compliant metadata 
creation tools, including Metamaker which was developed 
by the U.S. Geological Survey, Biological Resource Division, 
to create F@DC-compliant metadata. Some FGDC-compliant 
products have been developed by geographic information 
system (GIS) vendors to support the documentation of 
geospatially referenced datasets stored within their products. 
While many of these are proprietary, there are efforts under 
way through the Open GIS Consortium to support an open 
metadata tool. Extract tools which analyze the resource and 
automatically create a metadata record are also available. 
The Nordic Web Project has developed software to extract 
metadata from aselected Web site and create preliminary 
Dublin Core records. CORC also creates an initial metadata 
record by extracting key information from the resource itself. 
It then builds a “pathfinder” which brings together links to 


resources. 


Cataloguing Digital Resources : Metadata and Its Creation 651 


The creation of metadata both automatically and by 
people such as researchers who are not familiar with 
intellectual control raises several key issues. While attempts 
have been made for originators to provide metadata for their 
resources, in some cases the quality is less than desirable 
due to inconsistency, omission of important elements, or lack 
of controlled vocabulary. This can be solved by a review cycle 
by information professionals. However, these additional 
procedures increase the cost of the metadata creation. 


For any given project, there must be a balance between 
optimal quality and the resources available. There are two 
keys to solving some of these challenges. The first is adequate 
training and awareness among metadata creators and data 
originators. Originators should be encouraged to use 
metadata creation tools and to think about entering consistent 
information. The other solution comes from metadata tool 
developers. Both commercial and proprietary software tools 
are addressing the need for quality. The tools are beginning 
to support improved validation rules, pick lists that limit the 
selection in a particular field, and the use of authority files 
and controlled vocabularies. Software may support templating 
and other customization to streamline data entry also. 


6.15. METADATA AND THE STANDARDS PROCESS 


Many of the metadata schemes described here were 
developed by consensus within specific communities. Some 
of these are now seeking acknowledgment from national and 
international standards bodies such as NISO. 


739.50 is a NISO and ISO standard protocol for cross- 
system search and retrieval. It is described in Panel 6.3. 


652 Manual of Digital Libraries 
Panel 6.3. Z39.50 


Z39.50 is a protocol, developed by the library 
community, that permits one computer (the client) to 
search and retrieve information on another (the 
database server). Z39.50 is important both technicaily 
and because it is widely used in library systems. In 
concept it is not tied to any particular category of 
information or any particular type of database, but 
much of the development has concentrated on 
bibliographic data. Most implementations emphasize 
searches that use bibliographic attributes to search 
databases of MARC catalog records and present them 
to the client. 


Z39.5G is based on an abstract view of database 
searching. It assumes that the server stores a set of 
databases with searchable indexes. Interactions are 
based on the concept of a sess/on. The client opens a 
connection with the server, carries out a sequence of 
interactions, then closes the connection. During the 
course of the session, both the server and the client 
remember .the state of their interaction. It is important 
to understand that the client is a computer. End-user 
applications of Z39.50 require a user interface for 
communication with the user. The protocol makes no 
statements about the form of the user interface or about 
how it connects to the Z39.50 client. 


A typical session begins with the client connecting to 
the server and exchanging initial information, using 
the /nit facility. This initial exchange establishes 
agreement on basics, such as the preferred message 
size; it can include authentication, but the actual form 
of the authentication is outside the scope of the 
standard. The client might then use the exp/ainservice 
to inquire of the server what databases are available 
for searching, what fields are available, what syntax 
and what formats are supported, and other options. 


The search service allows a client to present a query 


Cataloguing Digital Resources : Metadata and Its Creation 


to a database, as in the following example: 


In the database named “Books” find all records for 
which the access point f/t/e contains the value 
“evangeline” and the access point authorcontains the 


value “longfellow.” 


The standard provides several choices of syntax for 
specifying searches, but only Boolean queries are 
widely implemented. The server carries out the search 
and builds a results set. One distinctive feature of 
Z39.50 is that the server saves the results set. A 
subsequent message from the client can reference the 
results set. Thus, the client can modify a large set by 
increasingly precise requests, or can request a 
presentation of any record in the set, without searching 


the entire database. 


Depending on the parameters of the search request, 
one or more records may be returned to the client. 
The standard provides a variety of ways for clients to 
manipulate results sets, including services to sort 
them and to de/ete them. When the searching is 
complete, the next step is likely to be that the client 
sends a present request. This requests the server to 
send specified records from the results set to the client 
in a specified format. The present service has a wide 
range of options for controlling content and formats, 
and for managing large records or large results sets. 


In addition to the basic services, Z39.50 has facilities 
for the browsing of indexes, for access control, and 
for resource management, and it supports extended 
services that allow a wide range of extensions. This is 


a large and flexible standard. 


653 


One of the principal applications of Z39.50 is for 
communication between servers. A catalogue system at a 
large library can use the protocol to search a group of peers 
to see if they have either a copy of a work or a catalogue 
record for it. End users can use a single Z39.50 client to search 


654 Manual of Digital Libraries 


several catalogues, sequentially or in parallel. Libraries and 
their patrons gain considerable benefits from sharing 
catalogues in these ways, yet interoperability among public 
access catalogues is still patchy. Some 239.50 
implementations have features that others lack, but the 
underlying cause of the patchiness is that the individual 
catalogues are maintained by people whose first loyalty is to 
their local communities. Supporting other institutions is never 
the first priority. Even though institutions share compatible 
versions of Z39.50, differences in how the catalogs are 
organized and presented to the outside world remain. 


6.16. TOOLS FOR BUILDING UP A META RESOURCE 


There are a couple of off-the-shelf software packages 
available for developing meta resources. These include: 
— Knowledge Cite Library (Silver Platter) 

(http://www.knowledgecite.com/) 

— Database Adviser 

(http://scilib.uscd.edu/proj/dba/) 

— Pharos 

(http://uias.calstate.edu/) 

— Northern Light 

(http://northernlight.com/) 

— ROADS 
(http://www.roads.lut.ac.uk/) 

Two important tools that may be used to set up subject- 
based meta resource are discussed below. While ROADS 
provides a directory structure to a meta resource allowing 
comprehensive coverage of resources on a given topic, Ht/ 
Dig follows a robot-based search approach by subject terms 
to look for information and update the databases that form 
the basis of retrieval systems in a semi-automated 


environment. 


Cataloguing Digital Resources : Metadata and Its Creation 655 


6.16.1. ROADS (Resource Organization and Discovery in 
Subject-based Services) 


ROADS is a set of software tools and standards 
designed to set up and maintain meta resources for all kinds 
of Internet resources, including WWW sites, Telnet-based 
services, FTP sites, mailing lists, etc. It is a project of the 
Centre for Computing in the Social Sciences, University of 
Bristol. ROADS-based meta resources are developed on a 
database that contains information about information 
resources. The ROADS software is written in PERL. It runs 
on any modern version of the Unix operating system, such 
as Linux. ROADS is designed to overcome the problems of 
general Internet search engines as it allows design of meta 
resources that include resources that are fully described or 
abstracted, and classified according to a recognized 
classification scheme, allowing resources to be located much 
more efficiently. Contrary to the automated search conducted 
by general search robots, ROADS-based meta resources are 
maintained by human subject experts thus ensuring more 
precision in retrieval. 


Steps in Creation of the Meta Resources using 
ROADS : The three basic steps involved in creating meta 
resources using ROADS are as follows: 


— Creation and maintenance of records in a database of 
resource descriptions; 


— Automatic generation of web pages on-the-fly using the 
information in the database records; and 


— Indexing, search and retrieval engine that allows the 
database to be interrogated using simple keyword, or 
more complex Boolean searches. 


Creation and Maintenance of the Database: ROADS 
offers tools that help in the creation of records and facilitates 
maintenance and editing of records. Records can be entered 


656 Manual of Digital Libraries 


manually through any text-editing program, but it is 
recommended that the tools that ROADS provides be used 
for creation and maintenance of the database. These tools 
are made available as web forms that can be filled-in through 
Internet browsers. The tools also help by automatically filling- 
in certain attributes, such as the record Handle — the unique 
identifier for the record, and the date and time the record was 
created. The first record creation screen allows selection 
whether it is creating a new record, or editing an existing one. 
It guides selection of the type of record that is to be created. 
The record creation process allows for a number of opiions. 
The record text can be returned to the screen — useful for 
checking the record before submitting it to the database, e- 
mailed to the database administrator, or entered into the 
database. Options are provided to select whether the resource 
should be added to the “Subject Listings” and the “What’s 
New’ listing at this stage. 


Once a record is created, it must be edited to keep it 
up-to-date. To edit a record, the edit option may be selected 
from the main template creation screen, and the handle of 
the record that has to be edited is required to be furnished. 
As the handles are rather long, this can often be difficult. 
Fortunately, ROADS provides another means of locating 
records for editing. A search is entered in the same way as 
normal search. The search results have a button after each 
resource that will display the record creation form with the 
fields already filled-in, ready to be edited. 


Subject Headings: in order to allow users of a ROADS- 
based meta resource to browse the database of resource 
descriptions, ROADS provides tools to create a set of subject 
listings consisting of a top-level listing of all the subject 
headings followed by listing of all the resources that come 
under a particular subject heading. The resource listing can 
contain links to both the resource itself and the corresponding 


Cataloguing Digital Resources : Metadata and Its Creation 657 


resource description. A resource can be inserted in the 
appropriate subject listing when the record is created by 
checking an option provided for the purpose at the bottom of 
the form. 


Processing of Records : ROADS incorporates tools 
that automatically process the database records to create 
HTML pages that allow the user of an information gateway to 
browse the resource descriptions efficiently, allowing the user 
to quickly find the resources that interest them. This includes 
creating a subject menu according to the selected 
classification scheme. 


As soon as a record is entered into the database, it is 
automatically indexed. This allows for very quick and efficient 
searching of the database. An easy to use web search form 
is provided to allow the database to be interrogated by the 
users. 


The advantages of using ROADS can be summarized 
as follows: 


— ROADS has the provision of meta resources with the 
subject specialists on whose knowledge they are based. 
This is essential with a subject-based service that filters 
out useless resources. 


— ROADS is based on the use of web forms. It includes a 
range of tools to automate publishing of a meta 
resource. 


— ROADS allows users of a meta resource a transparent 
means of locating and accessing Internet resources. 


— ROADS is based on a range of standards that allow 
information about resources to be easily exchanged and 
stored. It is intended to enable multi-disciplinary 
searches to be carried out across multiple ROADS- 
based meta resources. 


658 Manual of Digital Libraries 
6.14.2. Ht://Dig 


Ht/Dig is another useful tool for setting up a configured 
search engine. The ht:// Dig system is a complete worldwide 
web indexing and searching system for a domain and is a 
freeware available under GPSL. Ht://Dig was developed at 
San Diego State University as a way to search various web 
servers on the campus network. It compiles a database of all 
the documents that may be specified and included for the 
search and then performs the search. It has many features 
that help to customize the search domain and also to get 
output in desired formats. It performs the tasks in the following 
three steps: 


Digging : Digging is the first step in creating a search 
database using ht/Dig software. This system uses the word 
digging, while other systems call it harvesting or gathering. 
In the ht://Dig system, the program ‘htdig’ performs the 
information gathering stage. In this process, the program acts 
as a regular web user, except that it follows all hyperlinks 
that it comes across. The digging process will create at least 
two files. The first one consists of a list of all the words and 
the second one is a database of URLs and information about 
the URLs. 


Merging : Once the digging process is complete, the 
data are converted into a format, which the search engine 
can read. The ‘htmerge’ program does this. The term “merge” 
is used because data from several databases is gathered 
together and merged into several other databases. The 
source databases include the databases created by not only 
the latest “dig” but also any previously merged databases. 
The latest dig will produce a database that provides 
information on new pages and information on changes to 
previously existing pages; the information on the new pages, 
and the new information on changes to old pages are merged 

with the unchanged information to create up-to-date 


——— 


Cataloguing Digital Resources : Metadata and Its Creation 659 
databases. 


Searching : All the information gathered and organized 
during the dig and merge stages is used in searching. The 
‘htsearch’ program performs the actual searches. Its CGI 
(Common Gateway Interface) program, using the HTML 
“search form” as input performs the search and produces the 
HTML output which users see. 


Using Ht/Dig, both HTML documents and plain text files 
can be searched. Searches can be complex using Boolean 
expressions. Searches can be performed using various 
configurable algorithms like having exact searches, common 
word endings, synonyms, etc. Any number of keywords can 
be added to HTML documents that will not show up when the 
document is viewed. This is used to make a document more 
likely to be found and also to make it appear higher in the list 
of matches. Its another important feature is that the output of 
search can be customized. It would be interesting to output 
the search results in the Dublin Core-like format if all 
information required is available in the retrieved resources. 


6.17. IMPORTANT META RESOURCES 


There are several thousand general-purpose and 
subject-specific meta resources and portal sites available on 
the Internet. A few important meta resources are listed here 
and discussed briefly. 


6.17.1. LibrarySpot.com (http://www.libraryspot.com) 


LibrarySpot is a free virtual library resource centre for 
educators and students, librarians and their patrons, families, 
businesses and just about anyone exploring the web for 
valuable research information. LibrarySpot.com aims at 
breaking through the information overload, of the web and 
brings the best library and reference sites together. Sites 
featured on LibrarySpot.com are hand-selected and reviewed 


660 Manual of Digital Libraries 


by an editorial team for their exceptional quality, content and 
utility. Published by StartSpot Mediaworks, Inc. in the 
Northwestern University Evanston Research Park, 
LibrarySpot is the first in a family of vertical information portals 
designed to make finding the best topical information on the 
Internet a quick, easy and enjoyable experience. The 
LibrarySpot.com has received more than 30 awards and 
honours. Most recently, Forbes.com selected 
LibrarySpot.com as a “Forbes Favourite” site, the best in the 
reference category, and PC Magazine named it one of the 
Top 100 Websites. LibrarySpot.com has been featured on 
CNN, Good Morning America, CNBC and in many other 
media outlets. 


6.17.2. Librarians Index to the Internet (LII) (http://lii.org/) 


The Librarians’ Index to the Internet (LII) consists of 
more than 8,600 Internet resources selected and evaluated 
by librarians for their usefulness to users of public libraries. 
Free e-mail subscription to the LII New This Week (http:// 
www.lii.org/search/ntw) incorporates the most recent 
resources added to the LII. It has close to 12,000 subscribers 
in 85 countries. LII also offers co-branding service to the 
libraries that are members of the Library of California. The 
site provides both browsing and searching interfaces. 


6.17.3. Argus Clearing House (http://www.clearinghouse. 
net/) 


The Argus Clearing House is a guide to the meta 
resources. It provides a central access point for value-added 
topical guides that identify, describe and evaluate Internet- 
based information resources. Argus Clearinghouse is a non- 
profit venture run by a small group of dedicated individuals. 
The Argus Clearinghouse is intended to be a resource that 

brings together finding aids for students, researchers, 
educators, and others interested in locating authoritative 


Cataloguing Digital Resources : Metadata and Its Creation 661 


information on the Internet. 


6.17.4. Galaxy (http://galaxy.einet.net/) 


Galaxy, originally a prototype associated with the 
DARPA-funded Manufacturing Automation and Design 
Engineering (MADE) program, is the oldest browsable / 
searchable web directory. It is a searchable Internet directory 
with a mission to provide contextually relevant information 
by integrating state-of-the-art technology with the human 
touch. Galaxy employs the best of technology and human 
expertize to organize information in a way that makes it both 
understandable and highly relevant. The contents of the meta 
resource are compiled and organized by human Internet 
Librarians rather than by computer. The Galaxy hierarchy is 
built utilizing a vertical structure, i.e., the information on 
particular topics is very deep in content. While other search 
technologies may yield millions of pages per search Galaxy 
provides concentrated, relevant results. 


6.17.5. Direct Search (http://gwis2.circ.qgwu.edu/~ gprice/ 
direct.htm) 


Direct Search is a growing compilation of links to the 
Internet resources that contain data not easily or entirely 
searchable / accessible from general search tools like Alta 
Vista, Google, or Hotbot. Direct Search has its own search 


interface. 


6.17.6. Vlib: The Virtual Library (http://www.vlib.org/) 


The Virtual Library is the oldest catalogue of the web, 
started by Tim Berners-Lee, the creator of html and the web 
itself. Unlike commercial catalogues, it is run by a loose 
confederation of volunteers, who compile pages of key links 
for particular areas in which they are experts; even though it 
is not the biggest index of the web. The Virtual Library pages 
are widely recognized as being amongst the highest-quality 


662 Manual of Digital Libraries 


guides to particular sections of the web. Individual indexes 
live on hundreds of different servers around the world. A set 
of catalogue pages linking these pages is maintained at 
http://vlib.org. Mirrors of the catalogue are kept at East Anglia 
(UK), Geneva (Switzerland) and Argentina. Each maintainer 
is responsible for the content of their own pages, as long as 
they follow certain guidelines. 


6.17.7. Academic Info (http://www.academicinfo.com/) 


Academic Info, online since 1998, began as an 
independent Internet subject directory owned by Michael 
Madin and maintained with the assistance of a quality group 
of subject specialists. In the spring of 2000, Michael left the 
University of Washington Gallagher Law Library to focus 
solely on Academic Info and in 2002 Academic Info became 
a registered non-profit organization of the State of 
Washington. Academic Info is now ad-free and relies on 
donations to remain online. Academic Info aims to be the 
premier educational gateway to online high school, college 
and research level Internet resources. The primary focus of 
the site is academic, with its intended audience at the upper 
high school level or above. As a priority it adds digital 
collections from libraries, museums, and academic 
organizations and sites offering unique online content. The 
current focus is on English language resources but sites in 
other languages will be selectively considered. 


6.17.8. BUBL (http://bubl.ac.Uk/) 


BUBL LINK is the catalogue of selected Internet 
resources covering all academic subject areas which are 
catalogued according to DDC (Dewey Decimal Classification). 
All items are selected, evaluated, catalogued and described. 
Links are checked and fixed each month. LINK stands for 
Libraries of Networked Knowledge. BUBL 5:15 provides an 
alternative interface to this catalogue, based on subject terms 


Cataloguing Digital Resources : Metadata and Its Creation 663 


rather than DDC. Big subject areas are broken down into 
smaller categories. However, the upper limit of 15 is not rigidly 
applied, so there may be up to 35 items for some subjects. 
The subject terms used in BUBL LINK / 5:15 were originally 
based on LCSH (Library of Congress Subject Headings) but 
have been heavily customized and expanded to suit the 
content of the service. The aim is to make it very easy to 

locate Internet information about all academic subject areas. 

The BUBL LINK catalogue currently holds over 11,000 

resources, which are far smaller than the databases held by 

major search engines, but it can provide a more effective route 

to information for many subjects, across all disciplines. 


6.17.9. BIOME (http://biome.ac.uk/) 


BIOME is a collection of gateways, which provide 
access to evaluated, quality Internet resources in the health 
and life sciences, aimed at students, researchers, academics 
and practitioners. A core team of information specialists and 
subject experts based at the University of Nottingham 
Greenfield Medical Library has created BIOME. The Internet 
resources are selected for their quality and relevance to a 
particular target audience. They are then reviewed and 
resource descriptions created, which are stored, generally 
with the associated metadata, and generally in a structured 
database. The consequence of this effort is to improve the 
recall and especially the precision, of Internet searches for a 
particular group of users. BIOME is a hub within the Resource 
Discovery Network (RDN) (http:// www.rdn.ac.uk), and is 
funded by the Joint Information Systems Committee (JISC) 
(http://www.jisc.ac.uk/). There are five dedicated subject 
services or the gateways within BIOME, each covering a 
specific area within the health and life sciences. These 
gateways are AgriFor, VetGate, OMNI, Natural Selection and 
Bio Research. 


664 Manual of Digital Libraries 


6.17.10. The Scout Report (http://scout.cs.wisc.edu/report/ 
sr/ current/) 


Scout Report is the flagship publication of the Internet 
Scout Project. It publishes every Friday both on the web and 
by e-mail, and it provides a fast, convenient way to stay 
informed of valuable resources on the Internet. A team of 
professional librarians and subject matter experts select, 
research, and annotate each resource. It is published 
continuously since 1994, and it is one of the Internet’s oldest 
and most respected publications. The Internet Scout Project 
is located in the Department of Computer Sciences at the 
University of Wisconsin-Madison, and is funded by a grant 
from the National Science Foundation. 


6.17.11. LivingInternet.com (http://www.livinginternet.com) 


The mission of this website is to make comprehensive, 
in-depth information about the Internet available around the 
world. The site was developed from 1996 through 1999, 
posted on January 7, 2000, and is updated weekly. The site 
is equivalent to a book of more than 600 pages, with more 
than 2,000 intra-site links and 2,000 external links woven into 
the text, making it the first Internet publication of a reference 
work fully integrated with the web on this scale. Google ranks 
the site number one in the Internet courses category, and 
Yahoo lists it as one of the top three sites on Internet history. 


6.17.12. Edinburgh Engineering Virtual Library (http:// 
www.eevl.ac.uk) 


Edinburgh Engineering Virtual Library (EEVL) is an 
award-winning free service, which provides quick and reliable 
access to the best engineering, mathematics, and computing 
information available on the Internet. It is created and run by 
a team of information specialists from a number of universities 
and institutions in the UK for students, staff and researchers 


Cataloguing Digital Resources : Metadata and Its Creation 665 


in higher and further education, as well as anyone else 
working, studying or looking for information in Engineering, 
Mathematics and Computing. EEVL provides a central access 
point to networked engineering, mathematics and computing 
information. Resources being added to the catalogues are 
selected, catalogued, classified and. subject-indexed by 
experts to ensure that only current, high-quality and useful 
resources are included. These include e-journals, databases, 
training materials, professional societies, university and 
college departments, research projects, bibliographic 
databases, software, information services and recruitment 
agencies. 


EEVL, in addition to Internet Resource Catalogues, 
provides targeted engineering search engines to UK 
engineering sites, to engineering e-journals and to 
engineering newsgroups, and to specialized information 
services, such as the Recent Advances in Manufacturing 
(RAM) bibliographic database, and the Offshore Engineering 
Information Service. MathGate at EEVL is involved in the 
Secondary Homepages Project for UK Mathematics 
Departments. EEVL’s scope is limited to the three subjects, 
and is therefore more focused than the big search engines. 
Searching EEVL will retrieve high quality resources, but 
because EEVL’s resources are handpicked, the numbers of 
sources covered in it are not-comparable to the Internet 


search engine. 


6.17.13. Social Science Information Gateway (http:// 


sosig.ac.uk/) 

The Social Science Information Gateway (SOSIG) is a 
freely available Internet service which aims to provide a 
trusted source of selected, high quality Internet information 
for students, academics, researchers and practitioners in the 
social sciences, business and law. It is a part of the UK 


666 Manual of Digital Libraries 


Resource Discovery Network. The SOSIG Internet Catalogue 
is an online database of high quality Internet resources, which 
offers users the chance to read descriptions of resources 
available over the Internet and to access those resources 
directly. The Catalogue points to more than 21,000 resources, 
and each one has been selected and described by a librarian 
or academician. The catalogue is browsable or searchable 
by subject area. Social Science Search Engine is a database 
of over 50,000 Social Science Web pages. Whereas subject 
experts have selected the resources found in the SOSIG 
Internet Catalogue, those in the Social Science Search Engine 
have been collected by software called a “harvester”. All the 
pages collected stem from the main Internet catalogue which 
provides the equivalent of a social science search engine. 


6.17.14. Digital Librarian (http;//www.digital-librarian.com) 


This is maintained by Margaret Vail Anderson, a 
librarian in Cortland, New York. Internet information resources 
are catalogued according to subject categories and format- 
types. Digital Librarian does not have a search interface for 
the resources catalogued on the site. It has a browsing 
interface that gives hand-checke links and see also references 
to related resources. 


6.17.15. QUEST.net (http://www.re-quest.net/) 


QUEST.net is a free online library offering substantive, 

fully annotated, links to valuable resources in both a unique 

frame version and a non-framed version. This website to helps 

students and professionals to locate day-to-day and much 

needed information and resources in a relatively quick and 

concise manner. It serves as a one-stop resource directory, 
providing the Internet community with thousands upon 
thousands of links, which its committee of web surfers has 
found to be the most useful, informative and productive. The 


Cataloguing Digital Resources : Metadata and Its Creation 667 


meta resource provides a fully annotated description of each 
link together with its URL allowing visitors to know what to 
expect from the website. Each link has been specially hand 
picked to provide with the best and most relevant links in each 
category. This website is useful in an extraordinary way with 
it's devoted committee of web surfers who work diligently, 
day-after-day, sorting through the vast galaxies of cyberspace 
to bring the best and most current resources available. 


6.17.16. Internet Public Library (http://www. ipl.org/) 


Internet Public Library is a product of the University of 
Michigan’s School of Information and Library Studies. It 
includes extensive directories of online texts, newspapers, 
magazines and reference materials, plus an exhibit hall and 
subject sections including Reference Center, Reading Room, 
Search Tools, Youth References, and Special Collecting. It 
gives links to more than 20,000 books besides, magazines 
and online newspapers. It also features links critical and 
biographical sites dedicated to authors and their works, and 
an online history of the Harlem Renaissance in New York 
between 1900 and 1940. 


6.17.17. BioMedNet (http://www.bmn.com/) 


BioMedNet is owned by Elsevier Science and is part of 
the Reed Elsevier group of companies. BioMedNet is the 
website for, biological medical researchers. There are over 
800,000 members of BioMedNet with more than 20,000 
people joining per month. Membership to BioMedNet is free 
and members can search all of BioMedNet without charge. 
However, viewing full-text articles from publishers often 
requires payment or a subscription. The site has links to more 
than 3500 reviewed information resources. The resource 
provides online access to more than 15,000 review articles. 
HMS Beagle: the BioMedNet Magazine is issued every 


668 Manual of Digital Libraries 


fortnight. The magazine can be subscribed to by e-mail or 
can be accessed online. 


Summing up the chapter, it can be concluded that 
metadata play their important role in searching and finding 
relevant information from huge data over Internet and online 
environment. 


7 
Digital Preservation 


Digital preservation refers to a series of managed 
activities, which are necessary to ensure continued access 
of digital materials for as long as they are necessary. The 
term digitization refers to the conversion of material that was 
originally created in another form to digital form, which uses 
a binary numerical code to represent variables. The ultimate 
goal of preservation is to make the intellectual content to 
remain in tact as long as possible. The idea of protecting the 
original documents by reproducing it on a stable media gave 
rise to digitizing the maps, manuscripts, moving images, . 
music and sounds etc. Digitization of the old and fragile 
material will not only provide long time preservation but also 
offers the users to find, retrieve, study and manipulate the 
information in a colourful environment. 


Mekemmish has defined digital preservation as a mean 
that “enable reliable, authentic, meaningful and accessible 
records to be carried forward through time within and beyond 
organizational boundaries for as long as they are needed for 
the multiple purposes they serve.” Whereas, to Cornell 
University Library, Digital Preservation encompasses a broad 
range of activities designed to extend the usable life of 
machine-readable computer files and protect them from 
media failure, physical loss, and obsolescence. 


Digital preservation consists of the processed aimed 
at ensuring the continued accessibility of digital materials. In 


670 Manual of Digital Libraries 


order to understand the tasks of digital preservation, we need 
to devise some working definitions for the concepts of 
“document” and “digital object.” David Levy has offered some 
useful intuitive definitions: “Documents are talking things. 
They are bits of the material world—clay, stone, animal skin, 
plant fiber, sand—that we have imbued with the ability to speak. 
The fixity of microfilm and paper is generally considered to 
be much greater than any of the digital media. The great 
advantage of digital media, the ease of copying and 
modification, also becomes a major liability. 


The phrase “digital preservation” creates confusion 
since readers, familiar with traditional approaches, assume 
that “preservation” involves the use of well-defined techniques 
to prevent the original artifact from deteriorating further and 
to perhaps even improve it to the point where it can be used 
again. Digital preservation involves quite different methods, 
skills, and outcomes and can complement traditional 
preservation services, while simultaneously providing unique 
and dynamic new uses of information 


Digital preservation includes the preservation of print 
and non-print material in digitized form for effective, efficient 
and purposeful use. It can be classified into three types as: 


e Long-term Preservation: Continued access to digital 
materials, or at least to the information contained in 
them, indefinitely. 


° Medium-term Preservation: Continued access to digital 
materials beyond changes in technology for a defined 
period of time but not indefinitely. 


° Short-term Preservation: Access to digital materials 
either for a defined period of time while use is predicted 
but which does not extend beyond the foreseable future 
and/or until it becomes inaccessible because of 
changes in technology. 


Digital Preservation 671 


Digital preservation typically centers on the choice of 
interim storage media, the life expectancy of a digital imaging 
system, and the expectation to migrate the digital files to future 
systems while maintaining both the full functionality and the 
integrity of the original digital system. Digital materials can 
include everything from electronic publications on CD-ROM 
to online databases and collections of experimental data in 
digital format. 


Ultimate, the goal of digital preservation is to maintain 
the ability to display, retrieve, and use digital collections in 
the face of rapidly changing technological and organizational 
infrastructures and elements. 


7.1. PRESERVATION DEFINITIONS IN THE DIGITAL 
WORLD 


What, then, are - able definitions of preservation in a 
digital world? What concepts do these definitions need to 
accommodate? What difficulties do we encounier in trying to 
develop definitions that will assist us in our attempts to 
preserve digital materials? 


The first point to note is the dynamic nature of the field. 
We must be prepared to change our paradigms and the 
definitions that we develop from them. In a discussion of how 
archivists’ thinking can better inform digital preservation, 
Gilliland-Swetland comments that : the paradigms of any of 
the information professions come up short when compared 
with the scope of the issues continuously emerging in the 
digital environment. An overarching dynamic paradigm - that 
adopts, adapts, develops, and sheds principles and practices 
of the constituent information communities as necessary - 
needs to be created. 

Another important concept that our definitions need to 


accommodate is that of information being preserved, 
independent of the media in which it resides. This is now well 


672 Manual of Digital Libraries 


accepted; indeed, old preservation paradigm practices were, 
in recent decades, well acquainted with it through microfilming 
programs. It is no longer a fact that the original has ‘more 
integrity and veracity than a copy’ instead, in the digital world, 
we need to look further to define what attributes of the digital 
object we wish to maintain over time. 


The definitions should also accommodate the social and 
organizational aspects of digital preservation, the ‘public 
policy, economic, political, social or educational perspective’ 
in Cloonan’s words. Old paradigm definitions certainly 
recognized that there was more to preservation than the 
technical aspects - the IFLA Principles suggest it also 
encompasses ‘managerial and financial considerations, ... 
staffing levels [and] policies’ - but definitions of digital 
preservation need to go considerably further. They must be 
extended because of yet another factor, the need to start 
preserving digital materials almost from the moment of their 
creation - and, some suggest, even before they are created. 


Preservation, in the pre-digital paradigm, was usually 
applied retrospectively. Conservation procedures were 
applied to artifacts only after the artifact, or the information 
contained in or on it, had been deemed to be of significance 
and therefore worth preserving for use in the future. For 
example, books printed before 1800 are typically considered 
to be significant because they are the product of handcraft 
production techniques and, therefore, no two items are 
identical; and some artifacts are preserved because they 
attain iconic status -an Australian example is the veneration 
of relics of the nineteenth-century folk hero, Ned Kelly. We 
could also rely on benign neglect, where lack of action did 
not usually harm the item — assuming certain factors were in 
play, such as low use and did not significantly affect the 
likelihood of its survival. This concept no longer works for 
digital materials. The 3.5-inch diskette, very common until 


Digital Preservation 673 


recently, provides a good example. Information on it is likely 
to become unreadable for many reasons — the diskette may 
be stored in conditions too humid or too hot, the drives to 
read it may be superseded by newer technology and no longer 
be available, the driver software for that drive may no longer 
be found. After a period of time, the diskette becomes 

unusable. Active preservation needs to start close to the time 

of creation if there is to be any certainty that the digital 

information will be accessible in the future. 


To this list we can add other concepts. One is that for 
digital materials ‘their preservation must be an integral 
element of the initial design of systems and projects’. 
However, this is not usually the case. Another is that digital 
materials exist in a bewilderingly large number of formats - 
there is still little standardization. And yet another, this one 
possibly the most significant to accommodate, is that the 
preservation of digital materials is much more than the 
preservation of information content or physical carrier alone. 
It is about preserving the intellectual integrity of information 
objects, including capturing information about the various 
contexts within which information is created, organized, and 
used; organic relationships with other information objects; and 
characteristics that provide meaning and evidential value. But 
preserving the original bit-stream is only one part of the 
problem, equally important is the requirement to preserve ‘the 
means of interpreting, reading and utilizing the bit stream’. 


The difficulties of definition are not helped by disciplinary 
differences — there are, for instance, differences in the way 
terms are used by archivists and librarians, although they 
are drawing closer in the digital environment, where ‘the 
integrity and authenticity of digital objects is of mutual concern 
to both professions’. It may be noted that integrity and 
authenticity are terms deriving from the archival profession 
and were, until recently, not usually associated with the work 


674 Manual of Digital Libraries 


of librarians. However, these differences pale in comparison 
with the significantly different definitions used by the IT 
industry. 


How information technology professionals think about 
the long-term storage of digital data is a question that assumes 
some importance for digital preservation, because of the 
heavy reliance that information professionals place on their 
skills and services. There are abundant signs that the mindset 
of information technology professionals is significantly 
different when they think about preservation. Definitions of 
archive, archiving and archival storage give us some 
indication of their concerns. A selection of IT dictionaries from 
the shelves of the National Library of Australia’s Reference 
Collection in April 2003 and an internet search indicated that 
these terms were used in two ways: 


— The process of moving data to a different kind of storage 
medium: for example, archive, the process of moving 
data stored on an online, direct access device to an 
Offline storage medium because its frequency of use 
and time liness permit a delayed access to the data’. 


— The process of backing up data for long-term storage: 
for example, archive, to backup or make copies for long- 
term storage’. 


Few of these definitions display any interest or concern 
with the reasons why long-term storage might be required, 
although one notable exception was located: archiving long 
term storage of information on electronic media. Information 
is archived for legal, security or historical reasons, rather than 
for regular processing or retrieval’. Perhaps the mindset of IT 
professionals is better indicated by this excerpt— ‘You detect 
data that is not needed online and move it an off-shore store. 
When someone wants to use it, go find the off-line media 
and restore the data’. 


Digital Preservation 675 


There is no indication in these definitions of the period 
of time that /ong-term refers to, yet this is a crucial point for 
those who are concerned with preservation. An employer of 
Australian university, in a discussion of long-term storage 
facilities, suggests access ‘for a number of years’, or ‘a 
considerable time (years)’. While it is perhaps not especially 
helpful to define /ong-term in terms of a specific number of 
months or years, some awareness of the problems is required, 
such as that evident in the NSF-DELOS Working Group’s 
report /nvest to Save, where the OAIS (Open Archival 
Information System) definition of /ong-term is noted: 


A period of time long enough for there to be 
concern about the impacts of changing 
technologies, including support for new media 
and data formats, and of a changing user 
community, on the information being held in a 
repository. 


What, then, are the definitions currently being used? 
Definitions have, of course, changed as we come to know 
more about how to preserve digital objects. The example of 
time - how long we want to preserve material for - illustrates 
this point. Long-term, initially incorporating elements of old- 
paradigm thinking of indefinitely, or as long as possible - as 
in ‘digital preservation means retaining digital image 
collections in a usable and interpretable form for the long 
term’- is now more commonly defined in the preservation 
community in terms derived from the archival community, as 
the period during which the information remains of continuing 
value. Such a definition is more helpful than the commonly 
encountered phrase ‘over time’ - as in ‘ensuring the integrity 
of information over time, and ‘digital preservation — the 
processes and activities which stabilize and protect 
reformatted and “born digital” authentic electronic materials 
in forms which are retrievable, readable, and usable over 


676 Manual of Digital Libraries 


time’. These issues and concepts are covered in two of the 
most comprehensive of the recent publications on digital 
preservation, the influential Preservation Management of 
Digital Materials: A Handbook by Jones and Beagrie, and 
the UNESCO Guidelines for the Preservation of Digital 
Heritage, 2003. Selected definitions from these two sources 
are presented in Table 7.1. 


A new term ‘curation’ is gaining currency with the 
establishment of the Digital Curation Centre in 2004, although 
it is too early to know whether it will become the standard 
term. Curationis defined as : 


“the actions needed to maintain digital research 
data and other digital materials over their entire 
life-cycle and over time for current and future 
generations of users. Implicit in this definition are 
the processes of digital archiving and 
preservation but it also includes all the processes 
needed for good data creation and management, 
and the capacity to add value to data to generate 
new sources of information and knowledge ... ; it 
is the key to reproducibility and re-use ... Digital 
curation ... is about maintaining and adding value 
to, a trusted body of digital information for current 
and future”. 


Table 7.1. Selected Definitions 


ONE O ŘħŘħňħi 
Preservation Management of Digital Guidelines for the Preservation of 
Materials (Jones and Beagrie) Digital Heritage (UNESCO) 
o 
Accessibility - Continued, ongoing Accessibility - The ability to 
usability of a digital resource, access the essential, authentic 
retaining all qualities of authenticity, meaning or purpose of a digital 
accuracy and functionality deemed to object. Digital materials cannot be 

be essential for the purposes the said to be preserved if access Is 


i f preservation 
igi aterial was created and/or lost. The purpose of | 
ae for is to maintain the ability to present 


Digital Preservation 


Authenticity - The digital material 


is what it purports to be. In the case 
of electronic records, it refers to the 


trustworthiness of the electronic 
record as a record. In the case of 


“born digital” and digitized materials, 


it refers to the fact that whatever 


is being cited is the same as it was 
when it was first created unless the 


accompanying metadata indicates 
any changes. Confidence in the 


authenticity of digital materials over 


time is particularly crucial owing to 


the ease with which alterations can 


be made. 


Digital Materials - A broad term 
encompassing digital surrogates 
created as a result of converting 
analogue materials to digital form 
(digitization), and “born digital” 
for which there has never been 
and is never intended to be an 
analogue equivalent, and digital 
records. 


Digital Preservation - Refers to the 


series of managed activities 
necessary to ensure continued 


access to digital materials for as long 
as necessary. Digital preservation as 
defined very broadly for the purposes 


of this study and refers to all of the 


actions required to maintain access 
to digital materials beyond the limits 


of media failure or technological 
change. 


677 


the essential elements of 
authentic digital materials. 


Authenticity - Quality of 
genuineness and trustworthiness 
of some digital materials, as being 
what they purport to be, either as 
an original object or as a reliable 

copy derived by fully documented 
processes from an original. 


Digital heritage - Those digital 
materials that are valued 
sufficiently to be retained for future 
access and use. 


Digital materials - Generally used 
here as a preferred term covering 
items of digital heritage at a 
general level. In some places, 
digital object or digital resource 
have also been used. These terms 
have been used interchangeably 
and generically. 


Digital preservation - The 
processes of maintaining 
accessibility of digital objects 
over time is used to 
describe the processes involved 
in maintaining information and 
other kinds of heritage that exist 
in a digital form. In these 
Guidelines, it does not refer to the 
use of digital imaging or capture 
techniques to make copies of non- 


678 Manual of Digital Libraries 


digital items, even if that is done 
for preservation purposes. 


— Information Packages - 
Preservation depends on 
maintaining digital objects and any 
information and tools that would 
be needed in order to access and 
understand them. Together, these 
can be considered to form an 
information package that must be 
managed either as a single object 
or as a virtual package with the 
object and associated information 
tools linked but stored separately. 


= Preservation program - The set of 
arrangements, and those 
responsible for them, that are put 
in place to manage digital 
materials for ongoing accessibility; 
... is used to refer to any set of 
coherent arrangements aimed at 


preserving digital objects. 


These definitions may assist by providing useful starting 
points for an extended discussion of digital preservation. In 
particular, they address some significant questions that we 
need guidance on: 


e What exactly are we trying to preserve? 
e How long are we preserving them for? 
e What strategies and actions do we need to apply when 
we preserve them? 
7.1.1. What Exactly are we Trying to Preserve? 


One of UNESCO's thematic areas is culture, and, within 
that, heritage. Therefore the UNESCO Guidelines are 
primarily concerned with digital heritage, those digital 
materials that are valued sufficiently to be retained for future 


Digital Preservation 679 


access and use. Although this statement is general and does 
not assist us to decide precisely what it is we want to preserve, 
it does introduce the essential concept of selection, of deciding 
value - in this case, that digital material on which high value 
is placed. 


More specific in the UNESCO Guidelines and in 
Preservation Management of Digital Materials: A Handbook 
are the definitions of digital materials - the specific digital 
items, objects or resources that we are concerned with. Both 
sets of definitions categorically state that they are not 
concerned with the use of digitizing of analogue materials as 
a preservation technique. Jones and Beagrie’s handbook 
‘specifically excludes the potential use of digital technology 
to preserve the original artifacts through digitization’, and the 
UNESCO Guidelines are equally adamant, stating that digital 
preservation ‘does not refer to the use of digital imaging or 
capture techniques to make copies of non-digital items, even 
if that is done for preservation purposes’. Why such 
statements are thought to be worth making so firmly is 
explained by a strongly-held perception prevalent in the 
information professions that digitizing of analogue materials, 
often photographs or paper-based material, is sufficient for 
preservation purposes. The Parish Map Project created digital 
content from original paper artifacts, and descriptions of such 
projects are very commonly found in the literature. Not all 
information professionals seem to be aware of this distinction 
and of its consequences. One exception is Sassoon, who 
fervently argues that the lack of understanding about just what 
is happening when photographs are digitized, about ‘from 
where photographic meaning emanates’, results in the loss 
of crucial information and raises serious ethical issues about 
the destruction of information as a consequence of digitization. 


In terms of how they are preserved, though, the 
definitions in these two sources make no distinction between 


—— 


680 Manual of Digital Libraries 


born digital materials and digital materials created by digitizing 
analogue materials. This is acknowledged in Jones and 
Beagrie’s definition of digital materials, which covers both 
‘digital surrogates created as a result of converting analogue 
materials to digital form (digitization), and “born digital” for 
which there has never been and is never intended to be an 
analogue equivalent, and digital records’. 


But it is not simply the bit-stream that we are seeking to 
preserve, as these definitions make clear. In order to ensure 
access in the future to digital materials, we also need to take 
account of other attributes of the digital materials. The 
UNESCO Guidelines indicate this in the definition of 
information packages. 


Table 7.2. Selected Definitions 


Guidelines for the Preservation of Digital Heritage 
(UNESCO, 2003) 


Conceptual objects — Digital objects as humans interact 
with them in a human-understandable form. 


Essential elements — The elements, characteristics and 
attributes of a given digital object that must be preserved 
in order to re-present its essential meaning or purpose. 
Also called significant properties by some researchers. 


Logical objects — Digital objects as computer encoding, 
underlying conceptual objects. 


Physical objects — Digital objects as physical phenomena 
that record the logical encoding, such as polarity states 
in magnetic media, or reflectivity states in optical media. 


In addition to the bit-stream, which is typically ‘not 
understandable or re-presentable’ by itself, ‘any information 
and tools that would be needed in order to access and 
understand’ the digital materials must also be preserved. 


The definitions are also very specific about the need to 
maintain other attributes of digital materials. To ensure that 


Digital Preservation 681 


digital materials remain usable in the future, access to them 
is required - and not simply access, but access to ‘all qualities 
of authenticity, accuracy and functionality’. This, in turn, 
requires definitions of authenticity, well expressed by the 
UNESCO Guidelines as the ‘quality of genuineness and 
trustworthiness of some digital materials, as being what they 
purport to be, either as an original object or as a reliable copy 
derived by fully documented processes from an original’. Four 
further definitions in the UNESCO Guidelines clarify and 
emphasize these requirements (see Table 7.2). Digital 
materials can be considered as physical objects, logical 
objects, or conceptual objects. The physica/ object is the 
artifact — for example, the diskette, the CD, or the magnetic 
tape whose physical characteristics store in or on it the bit- 
stream - that is, the /ogica/ object. These are given sense 
when they are used by humans and are labeled as conceptual 
objects: ‘what we deal with in the real world’. Essentia/ 
elements or significant properties of digital materials, when 
taken together, enable us to re-present the materials in the 
manner in which they were originally intended; that is, to 
preserve them. 


7.1.2. How long are we Preserving them for? 


Although the definitions in the UNESCO Guidelines are 
not very helpful in providing guidance about the length of time 
we preserve digital materials, Jones and Beagrie assist with 
their articulation of long-term, medium-term and short-term 
preservation. Long-term preservation aims to provide 
indefinite access to digital materials, or at least to the 
information contained in them. Continued access to digital 
materials for a defined time — but not indefinitely, is medium- 
term preservation —here, the time period is long enough to 
encompass changes in technology. Short-term preservation 
is, in part, defined by changes in technology— access to digital 
materials is maintained until technological changes make it 


682 Manual of Digital Libraries 


inaccessible, or for a period during which the material is likely 
to be in use but which is relatively short. Such definitions 
provide helpful ways of thinking about digital preservation 
programmes— for example, about resource allocation, the 
long-term resource implications of embarking on long-term 
preservation being ongoing and therefore large. 


7.1.3. What strategies and actions do we need to apply when 
we preserve them? 


The definitions in these two sources make us aware, in 
a general sense, of the components of a digital preservation 
programme. In order to achieve the aim of a digital 
preservation programme - ‘maintaining accessibility of digital 
objects over time’, or ‘to ensure continued access to digital 
materials for as long as necessary’ - various processes 
forming a ‘series of managed activities’. 


7.2. PRINCIPLES OF PRESERVATION 


A consensus has emerged within a community of 
practitioners about a set of fundamental principles that should 
govern the management of available resources in a mature 
preservation programme in the past two decades. The 
principles of preservation in the digital world are the same as 
those of the analog world, and, in essence, define the priorities 
for extending the useful life of information resources. These 
concepts are longevity, choice, quality, integrity, and 
accessibility. Preservation in the digital world is one of the 
central leadership issues of day. It is the shared responsibility 
of many people in many institutions fulfilling many roles. 


An understanding of the impact of this role differentiation 
on digital preservation action is crucial. Role differentiation 
helps archivists and librarians- acting as digital product 
developers-know when to control their use of digital 
technologies, when they need to influence trends, and when 


Digital Preservation 683 


they need to relinquish any expectation for either control or 
influence. 


7.2.1. The Transformation of Longevity 


The central concern in traditional preservation practice 
is the media upon which information is stored. The top priority 
is extending the life of paper, film, and magnetic tape by 
stabilising their structures and limiting the ability of internal 
and external factors to cause deterioration. The focus on 
external factors has led to specifications for proper 
environmental controls, care and handling guidelines, and 
disaster recovery procedures. Progress on efforts to control 
or mitigate the internal factors of deterioration has resulted in 
alkaline paper standards, archival quality microfilm, mass 
deacidification, and more rugged magnetic media. 


And yet, now that archivists and librarians have defined 
the issues surrounding the life expectancy of storage media, 
the very concept of permanence that has driven the search 
for “archival” media is fading as a meaningful intellectual 
construct for preservation. Preservation in the digital context 
has little concern for the longevity of optical disks and newer, 
more fragile storage media. The viability of digital image files 
depends far more on the life expectancy of the access system 
a chain only as strong as its weakest component. 


Today's optical media most likely will far outlast the 
capability of systems to retrieve and interpret the data stored 
on them. Since it can never be known for certain when a 
system cannot be maintained or supported by a vendor, 
product developers must anticipate that valuable image data, 
indexes, and software will be migrated in their professional 
lifetimes to future generations of the technology. Digital 
managers can exercise a large measure of control over the 
longevity of digital image data through the careful selection, 
handling, and storage of rugged, well-tested storage media. 


684 Manual of Digital Libraries 


They can influence the life expectancy of the information by 
making sure that local budgetary commitments are made 
consistently at an appropriate level. Ultimately, they have no 
control over the evolution of the imaging marketplace, 
especially corporate research and development activities that 
have a tremendous impact on the life expectancy of the digital 
systems created today. 


7.2.2. The Transformation of Choice 


Choice is selection. Choice involves defining value, 
recognizing it in something, and then deciding to address its 
preservation needs in the way most appropriate to that value. 
Over decades the act of preservation has evolved from saving 
material from oblivion and assembling it in secure buildings 
to more sophisticated assessing of condition and value on 
already-collected materials. Preservation selection has largely 
been driven by the need to stretch limited resources in as 
wise a fashion as possible, resulting in the dictum that “no 
item shall be preserved twice.” The net result is a growing 
virtual special collection of items preserved with a variety of 
techniques, most notably by reformatting on microfilm. 


Selection is perhaps the most difficult of undertakings 
precisely because it is static and conceived by practitioners 
as either completely divorced from present use or completely 
driven by demand. Selection in the digital world is not a choice 
made once and for all near the end of an item’s life cycle, but 
rather is an ongoing process intimately connected to the active 
use of the digital files. The value judgments applied when 
making a decision to convert documents from paper or film 
to digital images are valid only within the context of the original 
system. It is a rare collection of digital files, indeed, that can 
justify the cost ofa comprehensive migration strategy without 
factoring in the larger intellectual context of related digital files 
stored elsewhere and their combined uses for teaching and 


learning. 


Digital Preservation 685 


Even while recognizing that selection decisions cannot 
be made autonomously or in a vacuum, librarians and 
archivists can choose which books, articles, photographs, film, 
and other materials are converted from paper or film into digital 
image form. Influence over the continuing value of digital 
image files is largely vested in the right to decide when it is 
time to migrate image data to a future storage and access 
system and when a digital file has outlived its usefulness to 
the institution charged with preserving it. What digital product 
developers cannot control is the impact of their ongoing value 
judgments on the abilities of readers to find and use 
information in digital form. Unused digital products might as 
well not exist; they certainly will not survive for long as mere 
artifacts of the conversion process. 


7.2.3. The Transformation of Quality 


Maximizing the quality of all work performed is such an 
important maxim in the preservation field that few people state 
this fundamental principle directly. Instead, the preservation 
literature dictates high quality outcomes by specifying 
standards for treatment options, reformatting processes, and 
preventive measures. The commitment to quality standards- 
do it once, do it right-permeates all preservation activity, 
including library binding standards, archival microfilm creation 
guidelines, conservation treatment procedures, the choice 
of supplies and materials, and a low tolerance for error. The 
evolution of preservation microfilming as a central strategy 
for the bulk of brittle library materials has placed the quality 
of the medium and the quality of the visual image on an equal 
plane. 


In the pursuit of quality microfilm, compromise on visual 
truth and archival stability is dictated largely by the 
characteristics of the item chosen for preservation. Quality in 
the digital world, on the other hand, is conditioned significantly 
by the limitations of capture and display technology. Digital 


686 Manual of Digital Libraries 


conversion places less emphasis on obtaining a faithful 
reproduction of the original in favor of finding the best 
representation of the original with a given technology. The 
mechanisms and techniques for judging the quality of digital 
reproductions are different and more sophisticated than those 
for assessing microfilm or photocopy reproductions. 


Additionally, the primary goal of preservation quality is 
to capture as much intellectual and visual content as is 
technically possible and then present that content to end- 
users in ways most appropriate to their needs. The image 
market has subsumed the principle of maximum quality to 
the “solution” that finds the minimum level of quality 
acceptable to today’s system users. Digital product 
developers must reclaim image quality as the heart and soul 
of preservation. This means maximizing the amount of data 
captured in the digital scanning process, documenting image 
enhancement techniques, and specifying file compression 
routines that do not result in the loss of data during 
telecommunication. 


The control of digital quality standards is possible now, 
just as it is for microfilm. However, librarians and archivists 
can only influence the development of standards for data 
compression, communication, display, and output. 
Improvements in the technical capabilities of image 
conversion hardware and software are in the hands of the 
imaging industry. 


7.2.4. Transformation of Integrity 


The concept of integrity has two dimensions in the 
traditional preservation context— physical and intellectual both 
of which concern the nature of the evidence contained in the 
document. Physical integrity largely concerns the item as 
artifact. It plays out most directly in the conservation studio, 
where skilled bench staff use water-soluble glues, age-old 


Digital Preservation 687 


hand-binding techniques, and high quality materials to protect 
historical evidence of use, past conservation treatments, and 
intended or unintended changes to the structure of the item. 
The preservation of intellectual integrity is based upon 
concern for evidence of a different sort. The authenticity, or 
truthfulness, of the information content of an item, maintained 
through documentation of both provenance—-the chain of 
ownership—and treatment, where appropriate, is at the heart 
of intellectual integrity. 


Beyond the history of an item is concern for protecting 
and documenting the relationships among items in a 
collection. In traditional preservation practice, the concepts 
of quality and integrity reinforce each other. In the digital world, 
maintaining the physical integrity of a digital image file has 
far less to do with the media than with the loss of information 
when a file is created originally, then compressed 
mathematically, stored in various formats, and sent across a 
network. In the domain of intellectual integrity, structural 
indexes and data descriptions traditionally published with an 
item as tables of contents or prepared as discrete finding aids 
or bibliographic records must be inextricably linked and 
preserved along with the digital image files themselves. 


Preserving intellectual integrity also involves 
authentication procedures, like audit trails, that make sure 
files are not altered intentionally or accidentally. Ultimately, 
the digital world fundamentally transforms traditional 
preservation principles from guaranteeing the physical 
integrity of the object to specifying the creation of the object 
whose intellectual integrity is its primary characteristic. 
Librarians and archivists can exercise to control over the 
integrity of digital image files by authenticating access 
procedures and documenting successive modifications to a 
given digital record. 


They can also create and maintain structural indexes 


688 Manual of Digital Libraries 


and bibliographic linkages within well-developed and well- 
understood database standards. Digital product developers 
also have a role to play in influencing the development of 
metadata interchange standards including the tools and 
techniques that will allow structured, documented, and 
standardized information about data files and databases to 
be shared across platforms, systems, and international 
boundaries. It is vain to think, however, that librarians and 
archivists are anything but bystanders observing the rapid 
development of network protocols, bandwidth, or the data 
security techniques that are essential to the persistence of 
digital objects over time. 


In the fifty years that preservation has been emerging 
as a professional specialty in libraries and archives, the 
preservation and access responsibilities of an archive or 
library have often been in tension. “While preservation is a 
primary goal or responsibility, an equally compelling mandate- 
access and use-sets up a classic conflict that must be 
arbitrated by the custodians and caretakers of archival 
records,” states a fundamental textbook in the field. The 
intimate relationship between preservation and access has 
changed in ways that mirror the technological environment 
of cultural institutions. 


7.2.5. Preservation or Access 


In the early years of modern archival agencies-prior to 
World War It-preservation simply was meant for collecting. 
The sheer act of pulling a collection of manuscripts from a 
barn, a basement, or a parking garage and placing it intact in 
a dry building with locks on the door fulfilled the fundamental 
preservation mandate of the institution. In this regard 
preservation and access are mutually exclusive activities. Use 
exposes a collection to risk of theft, damage, or misuse of 
either content or object. The safest way to ensure that a book 
lasts for a long time is to lock it up or make a copy for use. 


Digital Preservation 689 
7.2.6. Preservation and Access 


Modern preservation management strategies posit that 
preservation and access are mutually reinforcing ideas. 
Preservation action is taken on an item so that it may be used. 
In this view, creating a preservation copy on microfilm of a 
deteriorated book without making it possible to find the film is 
a waste of money. In the world of preservation and access, 
however, it is theoretically possible to fulfill a preservation 
need without solving access problems, Conversely, access 
to scholarly materials can be guaranteed for a very long 
period, indeed, without taking any concrete preservation 
action on them. 


7.2.7. Preservation is Access 


Librarians and archivists concerned about the 
preservation of electronic records sometimes view the two 
concepts as cause and effect. The act of preserving makes 
access possible. Equating preservation with access, however, 
implies that preservation is defined by availability, when 
indeed this construct may be getting it backwards. 
Preservation is no more access than access is preservation. 
Simply refocusing the preservation issue on access 
oversimplifies the preservation issues by suggesting that 
access is the engine of preservation without addressing the 
nature of the thing being preserved. 


7.2.8. Preservation of Access 


In the digital world, preservation is the action and access 
is the thing-the act of preserving access. A more accurate 
construct simply states “preserve accessibility.” When 
transformed in this way, a whole new series of complexities 
arises. The content, structure, and integrity of the digital 
product assume center stage-and the ability of a machine to 
transport and display this product becomes an assumed end 


690 Manual of Digital Libraries 


result of the preservation action rather than its primary goal. 
Control over accessibility, especially the capacity of the 
system to export digital image files — and associated indexes 
to future generations of the technology, can be exercised in 
part through prudent purchases of only nonproprietary 
hardware and software components. In the present 
environment, true plug-and-play components are more widely 
available. 


The financial commitment by librarians and archivists 
is one of the only incentives that vendors have to adopt open 
system architectures or at least provide better documentation 
on the inner workings of their systems. Additionally, librarians 
and archivists can influence vendors and manufacturers to 
provide new equipment that is backward compatible with 
existing systems. This capability assists image file system 
migration in the same way that today’s word processing 
software allows access to documents created with earlier 
versions. Much as they might wish otherwise, digital product 
developers have little or no control over the life expectancy 
of a given digital image system and the decision to abandon 
that system. 


7.3. THIRTEEN WAYS FOR DIGITAL PRESERVATION 


Now the focus of digital preservation has shifted away 
from the need to take immediate action to rescue threatened 
materials and toward the realization chat perpetuating digital 
materials over the long term involves the observance of 
careful digital-asset-management practices diffused 
throughout the information life cycle. This in turn requires us 
to look at digital preservation not just as a mechanism for 
ensuring bit sequences created today are renderable 

tomorrow but as a process operating in concert with the full 
range of services supporting digital-information environments 
as well as the overarching economic, legal, and social 


Digital Preservation 691 


contexts. So, we must look at digital preservation in many 
different ways. Lavoie and Dempsey have suggested thriteen 
ways Of looking at digital preservation. These are : 


7.3.1. Digital Preservation as an Ongoing Activity 


Preservation traditionally proceeds in fits and starts, with 
extended periods of inactivity punctuated by bursts of 
intensive effort—witness the brittle-book campaigns of the 
1980s or recent efforts to save movies filmed on nitrate 
cellulose film stock. The pattern is one in which materials are 
left to approach a state of crisis, at which point the situation is 
remedied through large-scale intervention. 


But digital materials generally do not afford the luxury 
of procrastination. The fragility of digital storage media, 
combined with a high degree of technology dependence, 
considerably shortens the grace period during which 
preservation decisions can be deferred. Issues of long-term 
persistence can arise as soon as the time at which digital 
materials are created—for example, in choosing between a 
widely used, stable digital format and one that is obscure or 
on the verge of obsolescence. This sense of urgency is driven 
largely by the fact that it is problematic to apply digital- 
preservation techniques ex post-that is, after deterioration 
has set in. While a print book with a broken spine can be 
easily rebound, a digital object that has become corrupted or 
obsolete is often impossible to restore. Digital-preservation 
techniques are most effective when they are preemptive. 


This suggests that as more and more digital materials 
come under the stewardship of collecting institutions, 
preservation will become less like an event, occurring at 
discrete intervals, and more like a process, proceeding 
relatively continuously over time. As a consequence, it will 
become more difficult to distinguish preservation activities 
from the routine, day-to-day management of digital materials. 


692 Manual of Digital Libraries 


It is important that the sudden ubiquity of preservation 
processes in digital-collection management does not interfere 
unduly with other components of the digital-information 
environment. Implementation of preservation measures 
should be as transparent as possible to users of digital 
materials and should not represent obstacles to access and 
use. In the print world, preservation of rare book collections 
is achieved in part by restricting usage — materials are 
accessed under the supervision of a librarian and off-premises 
circulation is prohibited. While these measures undoubtedly 
prolong the life of these valuable materials, they do little to 
promote their use. In the case of digital materials, mechanisms 
to ensure long-term persistence should operate harmoniously 
with mechanisms supporting dissemination and use. 


7.3.2. Digital Preservation as a Set of Agreed Outcomes 


It is one thing to recognize that actions must be taken 
to secure the long-term persistence of digital materials. It is 
another to articulate precisely what the outcome of 
preservation should be. This issue is not confined to digital 
materials. Nicholson Baker, for example, has decried 
reformatting efforts that result in the loss of the original item. 
The preservation of the original is the measure of successful 
preservation. To others, however, destructive microfilming 
meets their preservation needs in that content is transferred 
to a medium with a life expectancy of half a millennium. 


Similar questions are attached to the preservation of 
digital materials, but the issues involved are amplified. Digital 
content often embodies a degree of structural complexity not 
found in physical materials. It can subsume multiple formats, 
being at once text, images, animations, sound, and video; it 
can be interactive, providing tools for the user to create 
alternative views of the content or link to new content, It Is 

mutable, in that it can be updated or enhanced over time; it 
can be broken apart, with the pieces distributed and used 


——— 


Digital Preservation 693 


individually or recombined to create new resources. In short, 
digital content can incorporate features with no equivalent in 
the analog world. How many of these features can or should 
be preserved? 


Unfortunately, there is no single answer to this question. 
For some purposes, a preserved digital object must be a 
perfect surrogate for the original, replicating the full range of 
functionality as well as the original look and feel. But for other 
purposes, intensive preservation of this kind is unnecessary; 
perpetuating the object's intellectual content alone, or even 
a diminished approximation of the original object, is enough. 
For some, nothing less than retention in perpetuity constitutes 
successful preservation; while for others, a finite period is 
sufficient. 


These considerations suggest that the choice of 
preservation strategy will need to reflect a consensus of all 
stakeholders associated with the archived digital materials. 
Achieving such a consensus is difficult, and in some 
circumstances, impossible. A second-best solution is for the 
digital repository to articulate clearly what outcomes can be 
expected from the preservation process. These outcomes 
should in turn be understood and validated by stakeholders. 
Communication between the repository and stakeholders, 
either to promote consensus on preservation outcomes or 
for the repository to disclose and explain its preservation 
policies, mitigates the risk that the repository’s commitments 
are misaligned with stakeholder expectations. 


7.3.3. Digital Preservation as an Understood Responsibility 


The likelihood that digital-preservation activities will 
proceed continuously throughout the information life cycle 
Suggests that preservation responsibilities will extend beyond 
traditional stewards of the scholarly and cultural record. If, 
for example, preservation considerations must be taken into 


694 Manual of Digital Libraries 


account at the time of a digital object’s creation, it is authors 
and publishers, rather than libraries and archives, who must 
take the first steps toward securing the long-term persistence 
of digital materials. 


The need for entities beyond collecting institutions to 
play a role in preservation is not new: The publishing industry, 
in response to the brittle-books crisis, recognized and acted 
on the necessity to produce printed materials on acid-free 
paper. In the digital realm, entities who do not regard 
preservation as part of their organizational mission will find 
the scope for their involvement in the preservation process 
greatly expanded. Consequently, the responsibility for 
undertaking preservation will become much more diffused. 


The rapid take-up of networked digital resources, 
obtained through license or subscription, has led to portions 
of the scholarly and cultural record—for example, electronic 
journals, e-books, and websites—lying outside the custody of 
collecting institutions. This has prompted anxiety about the 
long-term stewardship of these materials, in particular when 
economic value has diminished while cultural importance has 
not. Since the value of certain digital materials can persist 
indefinitely, those who have custody of these materials during 
the various stages of the information life cycle must recognize 
and act upon the need to manage them in ways compatible 
with long-term preservation. 


The division of labour for preserving print materials is 

well established. The division of labour in regard to digital 

preservation has yet to be determined— for example, 

clarification of legal deposit requirements for digital materials 

will be a key factor in determining how much of the digital- 

preservation burden will be allocated to national libraries or 
archiving agencies. But the distribution of digital-preservation 
responsibilities is almost certain to include decision makers 
outside the cultural-heritage community. It is important that 


Digital Preservation 695 


these decision makers understand the necessity of taking 
steps to secure the long-term persistence of the digital 
materials under their control. 


7.3.4. Digital Preservation as a Selection Process 


The preservation of print materials is both a benign by- 
product of production and distribution modes and a process 
of active decision making and intervention. Preservation of 
digital materials will reflect a similar mix, although the dividing 
line between benign by-product and active decision making 
remains to be drawn, But as the volume of information in ` 
digital form continues to expand rapidly, an issue emerges 
that will surely require active decision making and 
intervention— What should be preserved? 


It is safe to assume that preserving everything is not 
an option. Digital preservation is expensive, and it is therefore 
impractical to make every bit of information in digital form the 
subject of active preservation measures throughout its entire 
life cycle. There are two options remain. One is to collect as 
many digital materials as possible and deposit them into 
mass-storage systems. The stored materials could then be 
sifted over time, with selections for more intensive 
preservation periodically made as need or interest arises. 


The save-now, preserve-later strategy is feasible only 
through the unique characteristics of digital information, where 
the steady decline in storage cost makes it conceivable to 
save everything. The chief criticism of this approach is 
summarized by the adage “Saving is not preserving”; there 
is considerable uncertainty concerning the extent to which 
preservation techniques can be applied retrospectively to 
digital materials that have resided untouched in storage for 


long periods of time. 


The second strategy is selection, that is, determining 
from the outset which digital materials should be preserved 


696 Manual of Digital Libraries 


and taking steps to curate them throughout their life cycles. 
The choice of which materials to preserve is a difficult one 
and will depend on a number of factors, including institutional 
mission, cultural preferences, economic practicality, and risk- 
management policies. The question will also hinge on the 
digital medium’s impact on the scholarly and cultural records. 
Is an e-mail discussion list, for example, part of the scholarly 
record, and if so, should it be preserved with as much care 
as the contents of a peer-reviewed journal? 


Selection is not just a preserve-or-not-preserve issue. 
It also involves the level of desirable intervention for a 
particular set of digital materials. Is it necessary to go to the 
trouble and expense of preserving a digital object in its original 
form? Or is preservation of the intellectual content enough? 
This issue presents difficult choices, but in a world of scarce 
preservation resources, these choices must be confronted. 


7.3.5. Digital Preservation as an Economically Sustainable 
Activity 
Two key economic challenges plague efforts to 
preserve digital materials. First, allocation of funds to digital 
preservation has been insufficient. Neil Beagrie has observed 
that in the context of funding decisions, the need to take 
immediate and frequent actions to preserve digital collections 
usually is overshadowed by the desire to create and 
disseminate new forms of digital content. Second, funds that 
are made available are usually provided on a temporary basis, 
often as grants to support one-off undertakings or special 
projects. Few institutions have allocated ongoing, budgeted 
resources for the long-term care of digital materials. 


The impulse to fund digital-preservation activities is 
dampened by the expectation that the costs will be formidable. 
It is difficult to forecast the precise magnitude of these costs, 
which will depend on factors such as system architecture, 


Digital Preservation 697 


length of archival retention, scale, and preservation strategy. 
But digital-preservation activities will require a substantial 
resource commitment to sustain them over time irrespective 
of their form. | 


Economic sustainability is the ability to marshal 
sufficient resources, on an ongoing basis, to meet 
preservation objectives. There are many avenues by which 
sustainability can be achieved. An institutional commitment 
to budget a continuous supply of funds to support digital 
preservation is one; these funds might be used to extend a 
pilot project originally funded through seed money from a 
grant-giving organization. Digital-preservation activities might 
also be self-sustaining, generating revenues as a byproduct 
of day-to-day operations. In these circumstances, economic 
sustainability might be defined in terms of cost recovery or a 
minimum level of profitability. 


Strategies for attaining economic sustainability must be 
built on a sound empirical footing; consequently, much more 
data on the costs of digital preservation are needed. Digital 
preservation is still in its infancy, and much of the available 
data are heavily skewed toward up-front costs— reformatting, 
setting up the digital repository, ingestion of materials, and 
so forth. As projects mature, empirical descriptions of digital 
preservation’s complete cost trajectory will emerge. These 
data must be consolidated and synthesized to produce 
reasonable benchmark estimates of the cost requirements 
associated with various forms of digital preservation. 


7.3.6. Digital Preservation as a Cooperative Effort 


The facts that digital preservation is expensive, funding 
is scarce, and preservation responsibilities are diffused 
suggest that digital-preservation activities would benefit from 
cooperation. Cooperation can enhance the productive 
capacity of a limited supply of digital-preservation funds by 


698 Manual of Digital Libraries 


building shared resources, eliminating redundancies, and 
exploiting economies of scale. 


In order to persuade institutions to invest in bringing 
digital collections online and to make these collections a 
meaningful part of research and learning experiences, there 
must be assurance that the collections will persist. But long- 
term stewardship may be beyond the means of an individual 
institution. Aggregating collections into union archives, 
maintained and funded as a shared community resource, 
would serve the dual function of promoting shared access 
and distributing the costs of long-term maintenance over a 
larger stakeholder community. The fact that both the benefits 
of access and the costs of long-term maintenance are shared 
by a large number of institutions would furnish a strong 
incentive to contribute materials to these shared digital 
collections. 


Cooperation would also minimize redundancy. The 
characteristics of digital information are such that relatively 
few archived copies of a digital resource will likely be required 
to meet preservation objectives. The rationale for this 
assertion is easy to frame. Sharing analog materials is 
generally more expensive than sharing digital materials; to 
access an archived copy of a print book, users must either 
travel to the book’s location or request that the book be 
shipped via interlibrary loan. To reduce access costs, it is 
desirable to preserve many copies of the same print book in 
geographically dispersed locations. In contrast, the ease with 
which digital information can be replicated and shared over 
networks suggests greater scope for preserving a particular 
digital resource in a single location rather than preserving 

copies in multiple locations. This can introduce significant cost 
savings by minimizing the incidence of redundant, fragmented 
efforts; multiple learning curves; and reinvention of wheels. 


Finally, cooperation opens possibilities for realizing 


Digital Preservation 699 


greater efficiencies through economies of scale. Maintaining 
digital materials over the long term will require an elaborate 
and costly technical infrastructure as well as specialized 
human expertise. It is economically impractical for every 
collecting institution to develop local digital-preservation 
Capabilities. A coordinated approach promises to be more 
cost-effective by spreading fixed costs over a greater number 
of institutions. It also might make certain kinds of highly 
specialized, or niche, digital-preservation activities 
economically feasible by expanding them to a sufficiently large 
scale to bring costs in line with benefits. These activities might 
be impractical if done piecemeal on a small scale. 


7.3.7. Digital Preservation as an Innocuous Activity 


In some circumstances, digital preservation is perceived 
as a threat to intellectual property rights. Much of this 
resistance can be attributed to the current ambiguity 
surrounding copyright law as it pertains to digital materials; 
the principles of fair use and legal deposit are in particular 
need of clarification. 


Digital materials purchased through license or 
subscription, such as electronic journals or e-books, illustrate 
the collision between the need to intervene to preserve digital 
materials and the need to protect intellectual property rights. 
These materials are typically accessed over the Web through 
a central server controlled by the content provider rather than 
through locally maintained copies. In these circumstances, 
the entities who perceive the need to preserve-that is, 
collecting institutions—are often distinct from the entities that 
hold the right to preserve as well as custody of the materials. 
The publishers are reluctant to distribute digital copies of their 
revenue-generating assets, even for preservation purposes, 
to individual licensees or subscribers; few institutions would 
have the resources to preserve the materials even if they 


did. 


700 Manual of Digital Libraries 


This presents two options — content providers must be 
persuaded or enjoined to preserve the materials in their 
custody, or alternatively, content providers must cede the right 
to preserve to another entity who is willing and able to assume 
responsibility for preservation. Currently, the latter approach 
seems to be in ascendance, evidenced by the emergence of 
escrow repositories or archives of last resort. For example, 
the publisher Elsevier has agreed to transfer a copy of the 
content available through its Science Direct service to the 
National Library of the Netherlands with the understanding 
that the library will maintain this material in perpetuity and 
assume the responsibility for making it available should 
circumstances prevent Elsevier from doing so through its own 
systems. 


Other issues remain to be resolved. In order to meet 
preservation objectives, the archiving agency may have to 
alter the archived content in some way—for example, by 
migrating it to another format in one keep pace with changing 
technologies or by disaggregating objects into more granular 

: resources, such as breaking up of a journal into its constituent 
articles. In these circumstances, appropriate permissions 
must be obtained from the rights holders in order to give the 
repository sufficient control over the archived materials to 
Carry out its preservation responsibilities. 


A balance between the interests of content providers 

and collecting institutions is best to be achieved through 

appropriately designed contracts. In the United States, 

copyright law is generally superseded by contract law, 

therefore, regardless of current interpretations of fair use or 

legal deposit, all stakeholders in a set of digital materials may 
address preservation requirements through provisions 
included in licensing or subscription agreements. An example 
of this is found in the United Kingdom’s model license 
governing digital materials licensed to UK institutions of higher 


Digital Preservation 701 


education. The model license includes archiving clauses that 
identify the need for libraries to have continued access to 
purchased materials following the license’s expiration and 
commits the publisher to address this need as part of the 
licensing agreement. 


7.3.8. Digital Preservation as an Aggregated or 
Disaggregated Service 


For the most part, digital-preservation systems have 
been designed holistically, combining raw storage capacity, 
ingest functions, metadata collection and management, 
preservation strategies, and dissemination of archived content 
into a physically integrated and centrally administered system. 
But other organizational structures are also possible — for 
example, digital-preservation activities might adopt a 
disaggregated approach in which the various components of 
the preservation process are divided into separate services 
distributed over multiple organizations, each specializing in 
a focused segment of the overall process. 


A digital-preservation system can be deconstructed into 
several functional layers. The bottom layer includes hardware, 
software, and network infrastructure supporting the storage 
and distribution of digital content. The next layer includes more 
specialized services to manage the archived content residing 
in the system, including metadata creation and management 
and validation of materials’ authenticity or integrity. The 
preservation measures are implemented in the next layer of 
services, including monitoring the repository’s environment 
for changes that could impact the ability to access and use 
archived content as well as initiating processes such as 
migration orem ulation to counteract these changes. The top- 
most layer includes services that support browsing or 
searching, access requests, validating access permissions, 


and arranging for delivery. 


702 Manual of Digital Libraries 


This range of functions can be offered as separate yet 
interoperable services that can be combined in various ways 
to support different forms of repository activities. For example, 
some digital materials might require only bit preservation— 
that is, an assurance that the bit streams constituting the 
digital objects remain intact and recoverable over the long 
term. Other materials, however, may require more 
sophisticated preservation services, such as migration to new 
formats or the creation of emulators to reproduce the content’s 
original look, feel, and functionality. Some preservation efforts 
will require active archives, characterized by a relatively 
continuous process of ingest and access. While other efforts 

might submit materials for preservation at irregular and widely 
spaced intervals, with little or no user access. 


These preservation activities may utilize various 
combinations of some or all of the services. A fully integrated 
system may find that one or more services end up 
underutilized and therefore of insufficient scale to realize 
technical or cost efficiencies. On the other hand, entities that 
specialize in only a few of these services may be able to 
spread them over a larger collection of digital materials. 
Smith's classic argument for specialization in production, or 
a division of labour. Determining the extent to which digital 
preservation can benefit from a division of labour, in the sense 
of finding (1) a sensible deconstruction of the digital- 
preservation process into a set of more granular services and 
(2) the optimal degree of specialization across preserving 
institutions, which is a key issue in the design of digital- 
repository architectures. 


7.3.9. Digital Preservation as a Complement to Other Library 
Services 
It is not too early to begin thinking about how digital- 


preservation mechanisms will be integrated with, and operate 
alongside of, the wide range of other services that, taken 


Digital Preservation 703 


together, constitute a digital library, although much work 
remains to be done to resolve the challenges specific to 
preserving digital materials. 


The notion of dark archives, supporting little or no 
access to archived materials, has met with scant enthusiasm 
in the library community. This suggests that digital repositories 
will function not just as guarantors of the long-term viability of 
materials in their custody but also as access gateways. This 
dual mission requires that preservation processes operate 
seamlessly alongside access services. The preservation 
should not impede access or reduce the scope for sharing 
information. Careful records of the outcome of preservation 
processes must be kept; for example, in cases where material 
is migrated to new formats, users must understand which 
versions of a particular digital resource are available for 
access and what alterations, if any, have been made to these 
versions as a consequence of preservation. 


As preservation assumes a more prominent role in the 
day-to-day management of digital collections, preservation 
activities will coexist and, at times, operate in concert with 
other routine collection management functions, such as 
acquisition, description, and inter-library loan fulfillment. When 
a new digital resource is acquired, it is simultaneously 
ingested by the digital repository’s archival system. At the 
same time that the resource is being prepared for circulation, 
it must also be prepared for long-term retention. Not only must 
the resource be surfaced in the library's access environments; 
for example, through a new record in the OPAC but it must 
also be surfaced in the library’s preservation system. Digital 
content management systems must find ways to integrate 
preservation tools and services into their environments. 


Itis essential that preservation actions be as transparent 
as possible to users of archived digital materials. It would be 
unfortunate if the preservation process reduced the scope 


704 Manual of Digital Libraries 


for sharing digital materials across systems, institutions, and 
users. In the print world, preservation often exacts a heavy 
toll on users’ ability to access material when books are 
removed from the shelves while they are re-bound, filmed, or 
scanned; when rigorous restrictions are placed on circulation; 
and when materials are taken out of circulation entirely. The 
characteristics of digital information are such that archived 
materials can be accessed and used without compromising 
on the quality and to achieving this in practice requires explicit 
recognition of the impact of preservation on access (and vice 
versa) in the design and implementation of digital library 
systems. 


7.3.10. Digital Preservation as a Well-understood Process 


There is as yet little consensus on best practice for 
Carrying out the long-term preservation of digital materials. 
The prospects for cultivating a shared view on this issue hinge 
on three factors — identification and development of standards 
to support digital preservation, suitable benchmarks and 
evaluative procedures for assessing the outcomes of digital- 
preservation processes, and mechanisms for certifying 
adherence to a minimum set of practices on the part of digital 
repositories. 


The emergence of standards would benefit many 
aspects of the preservation process. Some progress can 
already be reported. The Open Archival Information System 
reference model, which details a conceptual framework for 
an archival repository as well as the environment in which it 
operates and the information objects it manages, has been 
well received and extensively applied in the digital- 
preservation community. But many other areas remain to be 
addressed, ranging from preservation-quality digital formats 
to optimal preservation strategies for various classes of digital 


materials. 


Digital Preservation 705 


Digital preservation would also benefit from the 
articulation of benchmarks or metrics for evaluating the 
efficacy of preservation processes as they unfold. The 
preservation activities necessarily require institutions to incur 
costs well in advance of realizing benefits. How can decision 
makers be assured that investments to preserve digital 
collections are producing tangible results? It would be useful 
to devise a widely accepted set of evaluative procedures, 
similar to a quality-assurance audit and based on measurable 
aspects of the preservation process, that would serve as a 
reliable indicator of how well preservation activities are 
progressing toward meeting preservation objectives. 


Finally, well-understood processes for preserving digital 
materials must be paired with mechanisms for assessing 
whether a particular digital repository commands the expertize 
and resources to carry them out. The preservation requires 
institutions to transfer valuable materials into the custody of 
the repository and its staff. These transfers must be 
accompanied by a high degree of confidence that the 
materials will be preserved according to well-known and 
established procedures. Such conditions exist in preservation 
microfilming, where fragile printed materials such as old 
newspapers and books are entrusted to service providers 
with the understanding that the materials will be returned 
unharmed. A similar element of trust must be cultivated in 
the digital-preservation community. One way to contribute to 
this is through the establishment of certification procedures 
for digital repositories. Certification would indicate that a 
repository has met certain minimum requirements in its 
curatorial policies and procedures, including conformance to 
what is regarded as current best practice in digital 


preservation. 


The development and adoption of standards and 
evaluative metrics, along with certification of digital 


706 Manual of Digital Libraries 


repositories, will help dispel fears that scarce resources 
devoted to preservation will be wasted on nonstandard or 
outmoded practices and, as a consequence, fail to release 
their value in use. 


7.3.11. Digital Preservation as an Arm’s-length Transaction 


The responsibility for ensuring the permanence of the 
scholarly and cultural record is deeply rooted in the library, 
museum, and archival communities. But the characteristics 
of digital materials—their fragility, dependence on technology, 
and networked access—has unsettled preservation’s 
traditional division of labour. 


While it is certain that collecting institutions will continue 
to serve as the primary stewards of society’s memory, it is 
unlikely that every collecting institution responsible for the 
curation of digital materials will have the resources and 
expertise to implement the entire digital-preservation process 
locally. The part of the responsibility may be taken up by third- 
party services specializing in the preservation of digital 
materials. In this event, digital-preservation activities would 
be conducted as arm’s-length transactions between separate 
parties. This raises several questions concerning how such 
transactions would take place. 


An obvious issue is pricing — the costs of digital 
preservation are subject to the vagaries of numerous factors, 
chief of which is the constantly evolving technological 
environment with which digital materials are so closely 
intertwined. The more rapid the pace of technological change, 
the costlier it will be to ensure that archived digital objects 
remain usable. Given the uncertainty over the pace and 
direction of technological change, it is difficult to estimate 
future preservation costs and, therefore, suitable pricing 

scales. Widespread use of relatively stable digital formats 
and technology would mitigate this problem but not eliminate 


If. 


Digital Preservation 707 


Sustainable pricing models must also be developed. 
There exist several possibilities. For example, the repository 
could charge a one-time, up-front capitalized archiving fee, 
or alternatively, it could distribute the charges over time, 
perhaps as an annual fee. The pricing models must strike a 
balance between customers’ needs and preferences; e.g., 
inability to pay a large up-front fee or desire to avoid budgeting 
ongoing funds, and those of the repository — e.g., difficulty in 
collapsing future preservation costs into a one-time fee or 
need to invest large sums up front to meet future preservation 
commitments. 


A related question concerns what is supplied in 
exchange for payment. What preservation guarantees can 
the digital repository offer? To what compensation is the 
depositor entitled if promised outcomes are not achieved? 
Should the repository guarantee a specific outcome 
associated with its preservation process — “these digital 
objects will be renderable, using contemporary technology, 
in fifty years”, or should only the process itself be guaranteed— 
“these digital objects will be recorded on up-to-date digital- 
storage media, refreshed at regular intervals, and maintained 
under environmentally controlled conditions”? Resolution of 
these issues must emerge from a convergence of customer 
expectations and repository commitments. 


7.3.12. Digital Preservation as One of Many Options 


An implicit assumption attached to most discussions of 
digital preservation is that materials currently in digital form 
must be preserved in digital form. For some materials—such 
as born-digital materials with no obvious print equivalent— 
there may be no choice but to preserve them as digital objects. 
But a large class of materials, including digital surrogates of 
analog items as well as born-digital objects for which analog 
equivalents can be easily produced, present other options in 
addition to digital preservation. Indeed, analog manifestations 


708 Manual of Digital Libraries 


of digital materials may already be the subject of preservation 
efforts, even as their digital equivalents are perceived to be 
at risk. Efforts to preserve digital materials must take into 
account potential overlap with analog preservation activities 
as well as circumstances in which preservation in analog form 
may be preferable to digital preservation. 


A document in digital form comprised solely of text and 
static images can be easily reproduced as a paper document 
with little or no loss of information. In making this document 
part of the permanent scholarly or cultural record, which form 
should take precedence? For example, most researchers in 
the digital-preservation community are familiar with the 
Council on Library and Information Resources (CLIR) reports 
in maroon covers. These reports are available in print form 
and may also be downloaded from the Web in digital form. 
Which copy should be the focus of preservation activity? In 

this case, the print and digital versions are, for all intents and 
purposes, perfect substitutes. 


In cases where digital and analog versions differ, 
preservation issues become more complex. Even minor 
differences, such as pagination, may elicit questions as to 
which version should be considered the authoritative version 
for scholarly citation. For example, print magazine articles 
are easily cited by volume, issue, and page. However, online 
versions of these same magazines often omit pagination, 
presenting each article as one HTML file of unbroken text. 
More significant differences between digital and analog 
versions impacting appearance, functionality, or content 
amplify the problem. If one institution collects the analog 
version while another collects the digital version, which 
institution holds the official copy of record? Should both 
versions be preserved, or just one? Who decides? 


Preservation decision making in regard to materials 
existing simultaneously in digital and analog forms often must 


one 


Digital Preservation 709 


be informed by a longer view. Are multiple versions of the 
same item expected to coexist indefinitely, or is this merely a 
transitional state, with analog versions gradually supplanted 
by digital equivalents? In the latter case, preservation of only 
the digital version may be appropriate; in the former case, 
preservation of both versions might be necessary, or an 
authoritative version must be selected for preservation. 


The decision to preserve in digital or analog form may 
turn on a simple cost comparison of the two approaches, but 
ideally it should also take into account the preferences of 
users. Librarians discovered some time ago that users were 
resistant to replacing paper publications such as newspapers 
and magazines with microfilm copies, despite the advantages 
the latter format offered in terms of prolonging the longevity 
of the materials and reducing storage space requirements. 
In the same way, users may prefer that certain information 
resources be preserved as analog objects and others as 
digital objects. User preferences, such as concerns about 
ease of access, may override purely economic factors. 


7.3.13. Digital Preservation as a Public Good 


Few would disagree that preserving an information 
resource benefits its owner, whether a library, museum, 
archive, publisher, or private collector. But preserving a 
resource, and in so doing, making it part of the permanent 
scholarly or cultural record, also confers benefits on society 
at large by securing the resource’s continued availability for 
use by current and future generations of researchers and 
students. An institution that preserves the last copy of a 
resource has performed a service of potentially incalculable 
value to the public. In these circumstances, the benefits from 
preservation are widely distributed; unfortunately, the costs 


of preservation are not. 
A preserving institution can generate societal benefits 


710 ` Manual of Digital Libraries 


extending well beyond its immediate stakeholders. The costs 
of producing these extra benefits often remain 
uncompensated. In the analog world, inequities in the 
distribution of preservation costs have little impact on 
collecting institutions’ incentives to preserve. This partly 
reflects the mission of these institutions, which includes the 
responsibility to act as stewards of society’s memory. But 
other factors also play a role. Institutions directly own, and 
have physical custody of, one or more copies of the analog 
materials in their collections. The institutions are therefore 
uniquely placed to undertake the preservation of their 
materials, and this enhances the incentives to preserve. 


Another factor that strengthens preservation incentives 
for analog materials is that the distribution of the benefits from 
preservation are, in a sense, self-limiting. Analog items, such 
as print books, can be difficult and/or expensive to access by 
individuals outside the collecting institution’s direct user 
community. For example, interlibrary loan can cost as much 
as $30 to $50 per item. Extremely rare or valuable materials 
may not be circulated at all, further reducing the scope for 
access by outside users. 


The factors that enhance incentives to preserve analog 
materials—physical custody and limited opportunities for 
sharing—break down in the digital world. Digital resources are 
often obtained through license or subscription and then 
accessed by users from all institutions via a central web server 
operated by the publisher rather than being purchased outright 
and transferred into the custody of each collecting institution. 
Institutions, while considering the licensed digital materials 
part of their collections, nevertheless do not have physical 
custody and therefore have little or no opportunity to 
undertake their preservation. 


In addition to diminishing the notion of physical custody, 
digital materials are also more easily shared than analog 


Digital Preservation 711 


materials. Resources can be made available online and 
accessed from all over the world, making an institution’s user 
community potentially limitless. In these circumstances, there 
may be some resistance to underwriting expensive 
preservation activities that benefit a large pool of users, most 
of whom make no contribution to the preserving institution’s 
resource pool — via tuition, taxes, etc. Incentives to preserve 
are further reduced if the materials in question are not unique 
but instead held by multiple institutions. Which institution 
should go to the trouble and expense of preservation when 
the benefits, in terms of making the materials part of the 
permanent scholarly or cultural record, will accrue to all? 


Donald Waters points out, digital preservation exhibits 
characteristics of a public good, chief among which is the 
difficulty of excluding those who do not contribute toward the 
provision of the good from enjoying its benefits. Once a digital 
resource has been preserved by one institution, it has, in a 
sense, been preserved for all. In an era of rising costs and 
shrinking budgets, activities that confer uncompensated 
benefits outside the institution’s immediate stakeholder 
community may diminish in priority. Also, as preservation 
responsibilities diffuse beyond collecting institutions, 
preservation incentives will become even less assured. In 
the absence of a formal preservation mandate, incentives to 
preserve digital materials without compensation for the benefit 
to society as a whole may be weak indeed. 


So, preserving is more than just a technical process of 
perpetuating digital signals over long periods of time. It is 
also a social and cultural process in the sense of selecting 
what materials should be preserved and in what form; itis an 
economic process in the sense of matching limited means 
with ambitious objectives; itis a legal process in the sense of 
defining what rights and privileges are needed to support 
maintenance of a permanent scholarly and cultural record. It 
is a question of responsibilities and incentives, and of 


712 Manual of Digital Libraries 


articulating and organizing new forms of curatorial practice. 
And perhaps most important, it is an ongoing, long-term 
commitment, often shared, and cooperatively met, by many 
stakeholders. 


7.4. APPROACHES FOR DIGITAL PRESERVATION 


The below given are some important preservation 
strategies through which digital preservation can be achieved. 


7.4.1. Technology Preservation 


To preserve the technology required to access original 
records for as long as those records are required. But support 
for the software and hardware eventually ceases and the parts 
required to maintain the hardware become more and more 
scarce as manufacturers discontinue obsolete components. 
The number of machines available that are capable of 
recording old files continues to decrease, for computers do 
not last for ever. The skills required to operate the hardware 
and software also become rare and eventually disappear. 


7.4.2. Printing to Paper 


Although this approach is still in practice, printing all 
records to paper is not a viable preservative method. The 
printing to paper loses functional or behavioural traits that 
the records had in them digital form. Certain information may 
also be lost. 


7.4.3. Encapsulation 


The encapsulation approach retains the record in its 
original form, but encapsulates it with a set of instructions on 
how the original should be interpreted. This would be needed 
to be detailed format descriptions of the file format and what 
the information means. The process can be well depicted 


through Fig. 7.1. 


Digital Preservation 713 


~ E EA 


Surface Metadata 


pee 
Metadata 
(annolationAabelling) } „— x 
———— ~.. 


— 


Operating System Drape! ean) Emulator 
software Specification 
Original software 


Fig. 7.1. Encapsulation 


| 
| 
| 
| 
| 


7.4.4. Virtual Machine Software 


Raymond Lorie of IBM Almaden has proposed this 
approach. This addresses the problem of interpreting data 
files in the future by programming a set if instructions to carry 
out these interpretations in the machine language of a 
“Universal Virtual Computer” or the UVC. This programme 
would be written at the time the record was archived and would 
be preserved together with the record. In order to interpret 
the record on a future computer, a UVC interpreter would be 
required and this could be produced from the specifications 
of the UVC. With this process the data can be stored in any 
format and the knowledge required to decode it is 
encapsulated in the UVC programme. 


7.4.5. XML 


Extensible Markup language is a text-based markup 
language for describing the structure and meaning of data. 
As it is text based, it is human readable, but desired primarily 
to be easy to process using the computers. It is a open 
standard defined by World Wide Web Consortium. The 
conversions of records to XML format can be seen as a 
Particular type of migration. It is often regarded as a very 


714 Manual of Digital Libraries 


promising present day data format for archiving and 
interoperability and so deserves to be considered as an 
approach in its own right. XML is of the greatest importance 
for digital preservation, not just because of this widespread 
uptake, but also because it protects the Achilles’ heel of digital 
documents — the dependence on obsolete operating systems 
and application software. It does this by being platform- and 
software-independent. 


7.4.6. Migration 


Migration can be defined as the transfer of files from 
one hardware configuration or software application to another 
configuration or application. The problem associated with 
migration is that the results are often unpredictable, mostly 
because of a lack of or because the process has not been 
fully tested. The results of migration are difficult to. predict, 
unless a substantial amount of work is first done regarding 
the specifications of the source and target formats. Migration 
can influence the authenticity of adocument. Each document 
that is preserved must be preserved ‘authentically’, otherwise 
the meaning and validity of the archival record cannot be 
guaranteed. This has both legal and archivist implications. 


7.4.7. Emulation 


The theory behind Emulation is that the only way to 
ensure the authenticity and integrity or the record over the 
long term is to continue to provide access to it in its original 
environment, i.e., its original operating system and software 
application. This can be only done by preserving not only the 
record, but also emulator specifications, which contains 
enough details about the original environment for that 
environment to be recreated on a future computer whenever 
necessary. Emulating strategies would involve encapsulating 

a data object together with the application software and to 
create or interpret it anda description of the required hardware 


Digital Preservation 715 


environment. From the Fig. 7.2, itis understandable that there 
are three technical options in emulation — Emulate 
Applications; Emulate Operating Systems; and Emulate 
Hardware platforms. 


| Technical Approach 
y J 
Wna and sihat ral? rad 
$ - application s Legal : 
: N oS ae organisational 
Digital Objects & =. -hardwere 7 
their integrity N U7 IPR 185ues of properstary 


systerns $ reverse anjinsenng 


Icak snd Ise! i 
Interest bs ty Central orgenisalion nesds3 to 


rèduæw avemaeds 


; “ Stondards/ Cost’ 
a opan spacs economics 
4 Metadata 
Other issues 


Fig. 7.2. Emulation 


It is suggested that these emulator specifications 
formalisms will require human readable annotations and 
explanations — the metadata. Preservation strategy may be 
emulation based or migration based, it is that both will have 
same role that is the long-term preservation of digital 
information that involves the creation and maintenance of 
metadata. Within an archive, metadata accompanies and 
makes reference each digital object and provides associated 
descriptive, structural, administrative, rights management, 
and other kinds of information, This metadata will also be 
maintained and will be migrated from format to format and 
standard to standard, independently of the base object it 
describes. A digital object enters a repository as a set of 
sequence of bits; it is accompanied by a variety of metadata 
related to that object. With proper storage management, 
replication and refreshing, this set of sequences of bits can 


716 Manual of Digital Libraries 


be maintained indefinitely. For example, The Pittsburgh 
Project, The UBC Project, The SPIRT Record keeping 
Metadata Project are some research projects and practically- 
based initiatives have been concerned with the development 
of record keeping metadata schemes and standards. 


A detailed discussion on various digitization activities 
shall be done later, first have a look on digitization policy. 


7.5. DIGITIZATION POLICY 


The first step in deciding to digitize is formulating a 
digitization policy. This acts as the equivalent of a collection 
development policy for your digital collections. It should spell 
out the requirements a collection must meet in order to be 
considered for digitization. You may want to begin with a vision 
statement for digitization that is based on the mission 
statement of your institution. 


It is important that any digitization effort fits into the 
institution's overall mission if it is to have the ongoing support 
of the administration. Once the vision has been stated, the 
policy should go on to address — access; the condition of the 
materials; preservation issues; the audience for the materials; 
the ownership of rights; and the project support. 


7.5.1. Access 


The one reason for digitizing materials is to improve 
accessibility. Some materials are not accessed because the 
potential users are unaware of them. Others are held in 
institutions that are far from major metropolitan areas. 
Sometimes the materials themselves are so fragile that 
access to them must be severely restricted. Digitization can 
address all of these barriers to access. Putting materials online 
increases public awareness of an institution’s holdings and 
may lead to increased traffic at the museum or library. 
Distance and travel are no barrier to accessing Online 


Digital Preservation 717 


collections. As long as users have access to the Internet, 
they can do their research from anywhere in the world. For 
professional researchers, this replaces lengthy 
correspondence with archivists and allows preliminary 
research to be conducted before traveling to an institution to 
view original materials. For the younger student or the 
hobbyist, online collections provide access to materials they 
would otherwise never get to see. 


While enhancing access may be the main goal, a policy 
must also cover any restrictions on access. This can range 
from inserting watermarks in photographs to prevent image 
theft to restricting access only to subscribers, members, or 
students and faculty. Restrictions on access may be 
necessary due to ownership rights or copyright issues, or out 
of the need to recoup costs by charging for access. The latter 
is not an ideal situation; librarians have a strong tradition of 
providing free access to all. Sadly, the economic realities may 
dictate that the price of digitization, including the often 
overlooked costs of maintaining a digital collection, be 
subsidized in some way by users. Whatever the institution 
decides about these issues should be spelled out clearly in 
the digitization policy. 


7.5.2. Condition of Materials 


A primary consideration is the condition of the original 
materials. While you may want to give priority to materials 
that are quickly deteriorating, you must consider the difficulties 
posed when working with fragile materials. Will the digitization 
process cause irreparable damage to the originals? In most 
cases, you do not want to destroy or damage the original to 
obtain a digital surrogate. If the materials are not unique, you 
may decide to risk ruining one copy in order to digitize the 
work, but a unique artifact must never be damaged in the 
digitization process. The point of digitization is to improve 
access to the object and to prevent wear and tear on the 


718 Manual of Digital Libraries 


original, not to replace the original with a digital substitute. 
Getting a good digital image from an old document is difficult, 
and enhancing the images or retyping text adds considerably 
to the cost of a project. If funds are limited, it may be wiser to 
undertake a project that will require less time and effort to get 
a usable product. 


7.5.3. Preservation 


Digitization offers an excellent opportunity to handle 
conservation and preservation needs. The materials must be 
in the best possible condition before digitization begins. Make 
any necessary repairs to minimize damage and produce the 
best possible digital image. Set out the guidelines for 
preservation of originals in your digitization policy. 


The other preservation issue that must be addressed 
in your policy is the long-term maintenance of the digital files. 
Consider how to migrate your files to new hardware and new 
platforms, how to maintain the functionality of your digital 
collection, and how to pay for its preservation. All too often 
projects focus on creating digital text and images and getting 
the collection online, but fail to plan for the future. If you are 
investing time, money, and effort in building your digital 
collection, you want to make sure that it is accessible over 
the long term. Your digitization policy should include your 
plans for maintaining your digital files. 


7.5.4. Audience 


Audience is a key consideration in planning your digital 
collection. Before you digitize anything, you must know who 
your audience is. Would the audience be different if the 
materials were freely available online? Your current audience 
greatly influences the selection of materials to digitize because 
it makes sense to give priority to digitizing materials for which 

there is a high demand. You want to save such materials 


from the effects of frequent use, and make them easily 


Digital Preservation 719 


available to researchers. The potential online audience may 
lead you to consider documents that are not in hign demand, 
simply because no one knows about them. Think about the 
hidden treasures of your collection. Perhaps a digital project 
is the way to reveal them to the world. 


Although the Internet has a global audience, you should 
give greater weight to the interests and needs of the audience 
in your traditional service area. If your institution has a 
mandate to serve a particular population, take the wants and 
needs of that group into account. You may want to concentrate 
on materials indigenous to your state or country, or of special 
interest to its citizens. A thorough knowledge of your current 
and potential audience is essential to the success of your 
digital projects. Grant-funding agencies expect you to discuss 
your stakeholders, i.e. those who will benefit from your project, 
in any application for funding. Knowing whom you serve and 
what materials they want to access is critical to your success. 
Do your audience analysis early, and keep it in mind through 
all stages of your project. 


7.5.5. Ownership Rights 


The most important issue affecting what you put online 
is ownership. If you do not have the rights to materials, you 
can not put them online unless you want ugly and expensive 
legal problems. If you are dealing with previously published 
materials, copyright issues come into play. If the material in 
question is still covered by copyright, you must obtain 
permission of the copyright holder before you can publish an 
online version. The copyright holder's willingness to grant 
permission hinges on whether or not the material still has 
commercial value. A professional society may be reluctant to 
allow the digitization of recent issues of its journals which it 
still hopes to sell, but will permit you to publish online versions 
of older issues. 


720 Manual of Digital Libraries 


It will sometimes be difficult to determine who holds 
the copyright to a work. This is especially true in the case of 
newspaper and journal articles. The copyright holder may be 
the publisher, or it may be the author of the piece. In the 
latter case, you would have to track down the writer or his or 
her heirs to obtain a release. These complications can be a 
deterrent to digitizing previously published material, unless it 
is old or a government publication. The materials published 
by government agencies are, in most countries, automatically 
in the public domain. 


Copyright is a complex subject, and the consequences 
of copyright violations can be severe, and you would do well 
to consult legal counsel before proceeding if you are in any 
doubt as to the copyright status of a work you plan to digitize. 


Unpublished works, such as personal papers or oral 
histories, present another set of problems. If the owner of the 
papers is long dead, there should be few problems with 
proceeding to digitize. However, if the owner is alive or only 
recently deceased, the issue is murkier. The owner’s 
permission may be easily obtained, or may have been 
included in the donor’s agreement when the papers were 
given to the institution. The papers are likely to include letters 
or other writings to or about other people, some of whom 
may still be alive. While your institution may have a legal right 
to publish these letters or documents, there are ethical 
questions that come into play. If there are derogatory 
comments or unsavory revelations in the materials that may 
cause pain or embarrassment to the living, should these 
documents be made public? What is the institution’s 
responsibility in this case? Similarly, oral histories and 
interviews may have been recorded before the issue of 

digitization arose. The interviewee may have misgivings or 
objections to having their words shared with the world. That 
"jg level of exposure far beyond what they anticipated when 
they agreed to speak. Unless the subject gave permission 


Digital Preservation 721 


for online dissemination at the time of the interview, the 
appropriate thing to do is to contact them and request 
permission before making the interview or a transcript 
available online. It will be helpful to address these matters in 
a digitization policy, so you have some guidance when such 
a situation arises. 


7.5.6. Project Support 


Finally, is the project support, in terms of money and 
commitment. As stated earlier, digitization is an expensive 
process, and nothing can be done without sufficient funds. 
The price of digitization includes not only the start-up costs - 
housing, hardware, software, training - but the labour or 
outsourcing costs for performing the work of digitization and 
the ongoing, long-term costs of preservation and 
maintenance. Your digitization policy should set out guidelines 
to help you determine costs that require outside funding of 
some sort and overhead costs that are absorbed into the 
institutional budget. It should also articulate the institution’s 
commitment to digitization as part of its mission. A digitization 
program requires a serious, ongoing commitment from the 
institution to insure its long-term survival. It would be most 
unfortunate to spend large amounts of time and money 
digitizing materials that will disappear in a few years because 
there was no institutional support for migrating and 
maintaining those digital collections. If an institution is not 
committed for the long haul, it should reconsider whether it 
wants to proceed with a digital project. 


7.5.7. Selection Criteria 


Once you have completed a digitization policy, you can 
use it as a guide for the selection criteria for your digital 
projects. It can be very helpful to prepare a checklist of these 
criteria to apply to any proposed project. Stuart Lee of Oxford 
University has developed a Decision Matrix for Proposed 


722 Manua! of Digital Libraries 


Digitization Projects that is an excellent guide for formulating 
your selection criteria checklist. It is online at http:// 
www.bodley.ox.ac.uk/scoping/ report.html. 


7.8. DIGITIZATION - POINTS TO BE CONSIDERED 


To make decisions on any technical issue, the digitiser 
must have a clear understanding of how they expect the digital 
object to be utilized. This could include, for example: 
increasing access; facilitating new research; aiding 
conservation; adding value, by perhaps including the objects 
in a learning and teaching environment; or comparing and 
searching objects alongside electronic resources of a similar 
nature. The possibilities are many, and none are mutually 
exclusive. Questions such as — who is expected to use the 
resource and why, and what are users expected to do with 
the resource should be addressed at the outset. When 
answers to these questions are known, the digitizer can then 
make effective and proper technical decisions about how. 


7.8.1. Compression 


The level of detail stored in a digital file has a 
corresponding increase in its file-size, which in turn impacts 
on storage and dissemination of the object. If a high resolution 
object is created, it is considered best practice to store at 
least two versions of the object: 


— the master high resolution uncompressed file that is 
archived at point of capture, and 


— a surrogate compressed version derived from the 
master for dissemination purposes. 


The uncompressed master or archival version should, 
wherever possible, be stored in an open source file format. 


7.8.2. Types of Compression 
There are two types of compression available : ‘lossless’ 


Digital Preservation 723 


and ‘lossy’. Lossless compression enables a reduction in file 
size without losing any information within the file, hence 
lossless. It does this using algorithms (or software routines) 
that reduce the number of bits used to represent data in a 
file, thereby reducing its size while retaining all the original 
information. Lossless compression typically achieves space 
savings of up to thirty percent. 


Lossy compression, on the other hand, reduces file size 
with a corresponding loss of data. It works by eliminating 
information from the file that the program deems superfluous. 
The lost information is either unnoticeable to the user, or can 
be recovered during decompression by extrapolation of the 
existing data. Lossy is probably the appropriate type of 
compression for data that is originally analogue rather than 
digital, such as video and sound clips, and continuous-tone 
greyscale or colour images. Joint Photographic Experts Group 
(JPEG) and Motion Pictures Experts Group (MPEG) are two 
common types of lossy compression, and the file reduction 
can be quite dramatic, resulting in possibly ninety percent 
smaller files. 


7.8.3. Pathways 


When starting to digitize it is thought best to capture as 
close an analogue object as possible. Depending on the 
nature of the source this could mean capturing directly using, 
for example, a flatbed scanner to digitize a text document, a 
digital camera to capture an object, or a digital camcorder or 
audio recorder to capture moving images or sound. However, 
it may be that the source is one step removed from the original 
in, say, capturing a slide or photograph of an object, or 
digitizing an interview stored on analogue audio tape. It is to 
be further removed from the original photocopy of a document, 
or a second analogue recording of a recording. The resulting 
file is the less likely to be a true representation of the original 
object or event, and the more likely that erroneous data could 


724 Manual of Digital Libraries 
have crept into the file. 


Making the decision about the method of capture is very 
much a project decision based on : the type of source material 
being digitised; the equipment; staff skills available; and the 
budget allocated, for both equipment and staff time. 


7.8.4. Different Types of Digital Scanning 


Sometimes decisions about how to capture are self 
evident and easy to make. For example, if digitizing a 
collection of 35mm slides, then a slide scanner is probably 
the best solution. If scanning a series of flat documenis, an 
A4 flat bed scanner would be a good choice. If capturing a 
museum collection that includes flat paintings, 3 dimensional 
objects and written documents, there is more than one way 
to go about digitising the collection. 


If digitizing a manuscript, there is often the need to have 
text transcribed beside an image. If filming an event there is 
likely to be moving image, sound, and text transcription and 
perhaps also still images as part of the final resource. 


7.8.5. Digital Objects 


There are various issues concerned with the capture 
of digital objects depending on the nature of the object itself. 
There are three broad types of digital object: 


e text based, 
e image based and 


° time based. 


7.8.5.7. Text Based 


Text processing and the scholarly use of digital texts is 
relatively long established, going back to the early days of 
computing in the late 1960s. The early text format used in 
computing was the American Standard Code for Information 


—————=———— 


Digital Preservation 725 


Interchange, more commonly known as ASCII text, 
sometimes referred to as plain text. ASCII is usually an 
adequate format for sharing documents between computers 
and applications when the text contains only simple, modern 
English prose. Simple US-ASCII uses 7 bits of information 
for character encoding, reserving the 8th bit for control and 
formatting information. Thus ASCII can encode only the upper 
and lower case characters of the Roman alphabet, along with 
the more common punctuation characters. 


A second and later text format is Unicode. It solves the 
problems of limitation evident in the use of ASCII plain text. 
The aim of Unicode is for all of the characters in all of the 
world’s languages, including some languages of the past, to 
be mapped unambiguously onto a distinct numerical code. 
Unicode is certain to become the standard for character 
encoding in the future, and is already supported by the latest 
versions of the major operating systems. To date, Unicode 
offers mappings for all major languages and coverage of less 
commonly supported languages is on-going 


There are two main methods used to digitize existing 
Texts - Transcription and Optical Character Recognition 
(OCR). 


Text Transcription : Transcription is the simplest 
method of digitization, as it requires only a person, keyboard 
and monitor. This denotes manual keying in of text into a 
computer. Transcription can be very accurate, particularly 
when working on documents with complex layouts and 
Passages of text that are difficult to read. Perhaps, for 
example, hand-written diaries with notes in margins and text 
flowing at odd angles, or newsprint made up with blocks of 
unrelated text on the page However, text transcription can 
also be time consuming, particularly if the work is outsourced. 
Also, spelling and other errors made by the transcriber can 
be difficult to find and fix as they are random. It is best practice 


726 Manual of Digital Libraries 


for any text transcription to have two people working on a 
document, one person transcribing and another proof reading. 


Optical Character Recognition (OCR) : The second 
method used to digitise text is scanning using OCR software. 
This is a more automated method of digitization, and OCR 
works by scanning a document and using a computer 
programme to ‘read’ the resultant digital image. OCR software 
employs various methods to achieve its results, such as: 


e Pattern Recognition, which uses pre recorded images 
in a database. It is good for documents with a uniform 
typeface. 


e Feature Extraction, which recognizes characters by 
their shape. Again this is good for high quality prints. 


e Structural Analysis, which examines the structure of 
each character — how many lines, vertical and horizontal 
it contains. It is better with poor quality texts. 


e Neural Networks, which works by comparing each 
character with characters the software has been trained 
to recognise. The neural nets therefore change and 
‘grow’ over time. Each character is given a confidence 
level, and, again, this is better with texts of poor quality. 


OCR has the benefit of being much faster than 
transcription, and therefore more economical, especially for 
clear type-written documents with simple iayout. Furthermore, 
errors made by the software tend to be systematic, and 
therefore easier to find and fix. OCR is less good when 
document layouts are complicated and text is difficult to read. 
Again, as with text transcription, it is best practice to error 
and spell check documents thoroughly, ideally using two 


people. 


7.8.5.2, Image Based 
Image based digitization can be of following types. 


Digital Preservation 727 


Faster (or Bit-mapped) Images : Raster images are 
made up of pixels, and each pixel stores information about 
the colour of an image. 


A black and white (bi-tonal) image has one ‘bit’ of 
information, either black or white. A greyscale image will be 
made up of 8 bits, going from white to black through shades 
of grey. The abbreviation RGB stands for Red, Green, and 
Blue. For an RGB colour image there are commonly 8 bits, 
making a 24 bit image. The latest capture equipment has the 
possibility of going up to 48 bits, with 16 bits of information 
per red, green and blue channel. 


Raster images are the most common image type on 
the web, for example in the file formats JPEG and GIF. Raster 
data models are often used in geographic information systems 
(GIS) to represent continuous surfaces - such as satellite 
images or historic maps. 


The resolution of an image concerns the number of 
pixels held within the digital file, and is measured in pixels 
per inch (ppi). The more pixels stored per inch, the greater 
the density of the colour information, and therefore the greater 
the detail evident in the image. This is known as ‘scan’ 
resolution, and it is important to scan at an appropriate 
resolution. The appropriateness of the resolution chosen 
depends on the intended purpose of the digital image. 
However, resolution or ppi is only an indicator of image size, 
and therefore ‘quality. 


In the context of a GIS (Geographical Information 
System), each pixel represents a known area on the surface 
of the earth. In this context, each pixel summarizes a square 
of known dimension: for a high resolution aerial photography 
a pixel may have a ‘ground resolution’ of 25cm - while for a 
low resolution satellite image it may summarize an area of 
Several square kilometres. 


728 Manual of Digital Libraries 


Vector Images : Another type of image is a vector 
graphic. Rather than being made up of pixels, these images 
are co-ordinate based; so two points a and b define a line, 
and three or more points, define an area. 


A common file format used to create vector graphics is 
Encapsulated Postscript (.eps). Scalable Vector Graphics 
(.Svg) is a newer format that utilizes XML technologies, and 
may become industry standard. A significant benefit of vector 
images is that because they are co-ordinate based - as 
opposed to pixel based raster images - they can be zoomed 
to any size without pixilation. Vector graphics are used often 
in virtual reality and 3-D modelling, as well as in Macromedia 
Flash applications. 


A good simple example of a vector graphic at work is a 
font used in a word processor. These fonts, or rather the 
images used to represent the fonts, are vector based, and 
when you increase the size say from point 10 to point 28, the 
image increases in size without any degradation. Vector data 
is also widely used in architecture and cartography, such as 
in Computer Aided Design (CAD) or in Geographic 
Information Systems. In these contexts the ability to use a 
single co-ordinates system allows diverse types of information 
to be reconciled. 


7.8.5.2. Time Based 


Digitizing time-based media — sound and video, throws 
up some different concerns for the digitizer. 


The first major issue to be addressed is the large size 
of the digital files produced, related to most other forms of 
digitization. To give an idea of scale, one second of digitized 
sound produces the same size of file as around one quarter 
of the complete works of Shakespeare digitized as text. 


The second issue, related to the size of the files, is how 


Digital Preservation 729 


to enable sound and video to be disseminated and displayed 
over the web. Like large raster images, some form of file 
compression is required to do this, and types of appropriate 
compression are listed below. Finally, unlike most other types 
of digital object that are readily viewable on screen, to view 
most compressed sound and video on screen a plug-in or 
viewer is often required to use the file. 


Sound : The process of moving from analogue to digital 
sound is called sampling. To reproduce an analogue sound 
to digital, one must sample the signal many times per second. 
The frequency of this sampling rate is measured in Hertz (Hz), 
and the range of each sample is measured in bits. For lossless 
digitization a minimum sampling rate of 36kKHz is normal, and 
the standard highest frequency for most computers is 44.1 
kHz. In terms of bit rate, 16 bits per sample is considered 
good enough, giving an overall bit rate of 192 kb/s. Common 
types of uncompressed sound file formats are Microsofts’ 
WAVeform PCM encoding (.wav), the default sound format 
on the MS Windows platform, and Audio Interchange File 
Format (aiff), the default format for the Apple MacOS. These 
formats provide accurate, high quality and lossless sound 
files. However, the file sizes are large and therefore suitable 
for archival master versions of files, but not for dissemination 
over the web. 


Compressed ‘lossy’ audio formats use specialist codecs 
(Compressor/ DeCompressor) to compress audio data. 
Popular variants include MPEG2 Layer 3 (MP3), Ogg and 
Rea’ Audio. MP3 is the most common compressed format. A 
sample rate of 44.1 kHz and bit-rate of 192 kbps or higher is 
advised to preserve quality. Using MP3 it is possible to 
achieve relatively big reductions in file size of up to about 
one twelfth of lossless .wav files. 


Moving Image -A digital video file is a sequence of still 
images or frames, played in rapid succession, usually 


730 Manual of Digital Libraries 


accompanied with audio data played in tandem. When played 
at a set rate (12 frames per second, or higher), the image 
sequence creates the illusion that the onscreen object is 
moving. Some common video formats are MPEG-1, MPEG- 
2, Audio Video Interleaved (.AV1) and Quicktime (.QT). Higher 
compression rates high and therefore smaller files may be 
achieved by using a third party codec such as DivX. 


For example, a 450 mb MPEG-1 file, may be 
compressed by more than half to around a 180 mb Divx file. 
MPEG-1 is a stable and documented format, which can be 
played back by most current computers and digital video 
players without the need for additional software. However, 
MPEG-2 and, more recently, MPEG-4, are better suited to 
high quality video, for do not place restrictions on resolution 
and dimensions, as happens with MPEG-1. MPEG-2 and 
MPEG-4 are likely to be the recommended archival formats 
for digital video for the next few years. There are many 
proprietary Codecs available for playing desktop video, such 
as, Quicktime, Real Media and Windows Media Video, which 
are useful for specific tasks but none of these formats should 
be considered for a master archival version of a file. 


7.9. DATA MODELS 


The digital objects referred to above, are, of course, 
single entities. The outcome of any digitisation process, will 
likely have many hundreds, sometimes thousands of these 
objects created. These may consist of objects of one type, 
say, for example — hundred digital images, or, sometimes 
combinations of text, image and time-based objects. It is 
necessary, therefore, to manage and organize these 
resources efficiently to enable their effective use and retrieval. 

In essence this is an information management issue, which 
is sometimes referred to as ‘data modelling’. In other words, 
data modelling is the mechanism by which individual digital 


Digital Preservation 731 


objects are stored, organised and managed together. There 
are essentially four conceptual methods of organising data: 
Lists, Hierarchies, Sets, and Geometric or Co-ordinate based 
systems. 


A simple example of a data model would be if you had 
a Listof contact details giving, say, contact name and email 
address. The best way to model this list data is in a tabular 
format, either in a spreadsheet, or if slightly more complex 
data manipulation and searching is required, in one table of 
a database. The wrong way to organize this data would be to 
store it in a text file, say an MS Word document. This would 
prohibit the effective use of the digital resource, as sorting 
and finding contacts would be made more difficult. 


Another method of modelling data is in a hierarchy. 
This is the classic tree structure, common for storing files 
inside folders on a computer. This structure can be seen 
graphically by viewing files on a Windows machine, using 
the Windows Explorer option. There are various resources 
that suit such organization. Most texts are organized naturally 
in a hierarchical fashion, e.g. a book, inside which are 
chapters, which consist of pages, which consist of sentences 
and so on. Or a poem, which has stanzas, which has lines, 
which have words etc. All text mark-up, whether Extensible 
Mark-up Language (XML), Hypertext Mark-up Language 
(HTML) or Standard Generalised Mark-up Language (SGML), 
follows this pattern. This is sometimes referred to as ‘parent- 
child’ model. It is common also, for archival resources to be 
organized in a hierarchy, as they are logically ordered in this 
way in the analogue world, for example, in boxes, that contain 
files, that contain documents. Therefore, most electronic 
archives will be stored in some form of hierarchical database. 


Sets are an effective method of storage particularly for 
objects that have clear relationships with one another. A good 
example of the set data mode! is a relational database, where 


732 Manual of Digital Libraries 


one object can have many related objects. This is known as 
a ‘one to many’ relationship. The advantage being here, that 
more objects can be added to the many side of the relationship 
without unnecessary repetition of information pertaining to 
the main object. For example, a relational database that 
contains images may have one main image with six related 
images showing different views. In this case, the main image 
information would be entered once only. Thus, relational 
databases avoid the unnecessary duplication of the same 
information in a database, sometimes known as avoiding 
redundancy. 


In Geography or Geometry based models, some data 
sets can be reconciled and explored most effectively by 
exploiting shared geography. So for example, modern and 
historic maps could be plotted together, with observations 
taken in the field or digitized from another source. Acommon 
model for storing such data is a Geographical Information 
System (GIS) . GIS combine many of the features of relational 
databases with image processing tools and thus present a 
powerful set of tools for unifying diverse data sets and for 
analysing relationships between them. In simple terms, it uses 
geography as a primary key for data. So, for example, it is 
possible to ask questions such as ‘show me all the information 
within 500 metres of where | am’ or to produce visualizations 
of data that would not normally lend itself to being shown in 
this way - tasks that would be impossible or very time 
consuming by other means. This is made possible when the 
different sets share a common vocabulary of co-ordinates. 


Choosing a data model for your project’s digital output, 
is based on three fundamental questions. How is the data 
organized naturally? What is the intended purpose of the 
resource, going back to the idea of ‘fit for purpose’? And, 
following on from this, what are the intended users’ 
expectations and experience when using resources of a 


similar nature? 


Digital Preservation 733 


The Open Archival Information System (OAIS) 
reference model brings together many of the concepts 
discussed so far in this chapter and points to others that are 
relevant to the question of which attributes of digital materials 
we are attempting to maintain into the future. This model was 
developed to provide a common framework for describing 
and comparing architectures and operations of digital 
archives. It is being adopted throughout the world as an 
important design schema for digital preservation systems. 
The digital preservation community is adopting the concepts 
and terminology embedded within it. The OAIS model is 
‘emerging as the first international standard in digital 
preservation’. 


The OAIS reference model was developed by the 
Consultative Committee on Space Data (based at NASA in 
the USA) and is now an international standard - ISO 
14721:2003. It is based on the concept of open standards, 
increasingly being recognized as a stable foundation for digital 
preservation activities. Breeding explains it as, ‘one of the 
- most important digital preservation strategies involves taking 
advantage of open standards whenever appropriate and 
avoiding proprietary data formats when possible’. The OAIS 
model has been used as the basis of digital preservation 
systems, such as those developed by OCLC, the Koninklijke 
Bibliotheek’s e-Depot, and the CEDARS archiving 
demonstrator project. 


The part of the significance of the OAIS reference model 
lies in the establishment of a common language for discussion 
of digital preservation — its Foreword notes that it ‘establishes 
a common framework of terms and concepts’ to allow ‘existing 
and future archives to be more meaningfully compared and 
contrasted’ and to promote standardization. For example, it 
defines /ong term preservation — ‘long enough to be 
concerned with the impact of changing technologies, including 
support for new media and data formats, or with a changing 


734 Manual of Digital Libraries 


user community. Long Term may extend indefinitely’. It makes 
a Clear distinction between simple data storage and long-term 
preservation. 


A major purpose of this reference model is to facilitate 
a much wider understanding of what is required to preserve 
and access information for the Long Term. To avoid confusion 
with simple ‘bit storage’ functions, the reference model defines 
an Open Archival Information System (OAIS) which performs 
a long-term information preservation and access function. 


The significance of the OAIS reference model also lies 
in its articulation of the functional requirements of a digital 
archival system. It defines five functions: 


— the Ingest function ‘responsible for receiving information 
from producers and preparing it for storage and 
management within the archive’. 


— the Archival storage function which ‘handles the 
storage, maintenance and retrieval of the AlPs (Archive 
Information Packages) held by the archive’. 


— the Data Management function which ‘coordinates the 
Descriptive Information pertaining to the archive’s AIPs, 
in addition to system information used in support of the 
archive’s function’. : 


— the Administration function which ‘manages the day- 
to-day operation of the archive’. 


— the Access function which ‘helps consumers to identify 
and obtain descriptions of relevant information in the 
archive, and delivers information from the archive to 
consumers’. . 


These five functions ‘taken together . . . identify the key 
processes endemic to most systems dedicated to preserving 
digital information’. They have proved to be influential, one 
indication being the widespread adoption of the term /ngest 


Digital Preservation 735 
in digital preservation discussion. 


OAIS concept of ‘information package’, is explained as: 
a conceptual container of two types of information called 
Content Informationand Preservation Description Information 
(PDI). The Content Information and PDI are viewed as being 
encapsulated and identifiable by the Packaging Information. 
The resulting package is viewed as being discoverable by 
virtue of the Descriptive Information. 


Preservation Description 
Information (PDI) 


Reference Context Provenance Fixity 
Information Information Information Information 


Fig. 7.3. Preservation Description Information 


This Reference Model : 


° Provides a frame work for the understanding and 
increased awareness of archival concepts needed for 
long term preservation and access; 


e Provides the concepts needed by non-archival 
organizations to be effective participant in the 
preservation process; 


e Provides a framework for describing and comparing 
different long term preservation strategies and 
techniques; and 


e Expands consumers on the elements and processes 
for long-term digital information, preservation and 
access, and promotes a larger market which vendors 
can support. 


736 Manual of Digital Libraries 


The reference model defines common terminologies 
like : AIP (Archival Information Package); SIP (Submission 
Information Package); DIP (Dissemination Information 
Package); PDI (Preservation Description Information). 


It has also discussed many important issues like : Ingest 
formats and processing; Use of standards; Metadata; Existing 
Records — bibliographic record in world Catalogue. 


The preservation Description part of the OAIS 
information model has divided Preservation Description 
Information into four categories: 


— Reference Information : t describes identification 
systems, and the mechanisms for providing assigned 
identifiers, used to unambiguously identify the Content 
information both internally and externally to the archive 
in which it resides. 


— Context Information: \t documents relationships of the 
Content Information with its environment, including the 
reasons for its creation and relationships to other 
Content Information objects. 


— Provenance Information : t documents the history of 
the Content Information, including its origin, changes 
to the object or its content over time, and its chain of 
custody. 


—  Fixity Information: \t provides the Data integrity checks 
or Validation/Verification keys used to ensure that the 
particular Content Information object has not been 
altered in an undocumented manner. 


In a nutshell, Preservation Description Information 
records the identity, relationships, history and integrity of the 
archived Content Data Object. With this it is understandable 
that Effective Metadata is a necessary condition for effective 
digital preservation. The elucidation and maintenance of 
Preservation Description Information, however, is the 


Digital Preservation 737 


keystone to building an information infrastructure to support 
the processes associated with digital preservation. 


In OAIS model the Digital Migration is defined to be the 
transfer of digital information, while intending to preserve it, 
within the OAIS. It is distinguished from transfers in general 
by three attributes: a focus on the preservation of the full 
information content; a perspective that the new archival 
implementations of the information is a replacement of the 
old; and full control and responsibility over all aspects of the 
transfer resides the OAIS. 


But recognized that — the Digital Migrations are time 
consuming, costly, and expose the OAIS to greatly increased 
probabilities of information loss. Therefore, an OAIS has a 
strong incentive to consider Digital Migration issues and 
approaches. 


7.10. CHOOSING SOFTWARE AND HARDWARE 


It is more important to get the data model right, than it 
is to choose a particular piece of software to represent that 
data model. In principle, for example, if you choose to organize 
your data in a relational database, it does not matter whether 
you choose MS Access, Filemaker Pro, Paradox, MySQL, 
Postgresql or any other relational database software. In 
essence these products all store data in the same way. There 
are caveats to this depending on use. For example, if the 
database is to be searchable over the web, or if a large amount 
of data is being stored and retrieval speeds are important, 
then the more robust MySQL would be preferred over MS 
Access. However, this strays from the basic premise that the 
underlying data are being designed and stored in the same 
model, whatever software is chosen. Similarly, a Lotus 
Spreadsheet will store Information in the same way as an 
MS Excel Spreadsheet and Corel Word Perfect, stores text 
in the same way as MS Word. 


738 Manual of Digital Libraries 


Having said this, there are important considerations to 
bear in mind when choosing software — such as: 


— Making sure the software performs the tasks required; 
— Picking software that is well used and has good support; 


— Choosing software that has good import and export 
functions; and 


— Choosing software that supports recognized 
international standards. 


Besides, digitization or digital preservation needs some 
minmum set of hardware requirements. Some standard 
software and hardware are described on ongoing pages. 


7.10.1. Software 


Software is the key element in successfully creating 
documents for archiving and publishing online. The programs 
you choose not only determine how you create your 
publications, but also how you develop your training methods. 
The logistics of your center depend greatly on how you employ 
and execute your software programs, so pay close attention 
to the types of software available and which programs are 
best suited for your needs. Building your software library 
requires knowing what programs are available and how to 
price them. A good rule to follow when purchasing software 
is to find a product that performs well and offers less expensive 
upgrades when newer versions are available. 


Upgrading allows you to stay with a product that you 


like using, and over the long term, you save money because 
upgrades are less expensive than full versions. 


Another alternative to consider when thinking about 
software is using freeware and/or shareware products in lieu 
of purchasing expensive programs. This approach, however, 
should only be used if the programs in question can 


adequately meet your needs. 


Digital Preservation 739 


S The types of software programs you will need to create 
digital publications include: 


e HTML editor; 

e XML editor, parser and XSLT processor; 
° Text editor and/or word processor; 

e Image editor; 

e Scanning software; 

e OCR software; 

e FTP software; . | 
° Page layout and design software; and | 
o PDF software. 


HTML Editor : An HTML (HyperText Markup Language) 
editor is any program that allows you to edit or write HTML 
code and documents. The two main types of HTML editors 
are — text-based editors and WYSIWYG (What You See Is 
What You Get) editors. Text-based editors such as Microsoft 
Notepad, Microsoft Word and Corel WordPerfect allow you 
to only edit or write HTML code. You cannot actually see how 
the HTML document displays until you save the file and open 
it in a browser such as Microsoft Internet Explorer, Netscape 
Navigator or Opera. These text-based programs are used to 
write HTML in its most basic format. There are other text- 
based HTML editors that provide more advanced features 
and were specifically designed for writing HTML. These 
editors make use of tags and features that allow the user to 
click a button for tags, bold, italics, and so forth, instead of | 
always typing each tag or command by hand. This is helpful 
and allows you to avoid repetition and reduces the amount of l 
errors that occur when typing by hand. A couple of these 
programs are NoteTab Pro and Arachnophilia. Free versions 
of both are available online. These editors have one big 


740 Manual of Digital Libraries 


advantage over word processing programs used as HTML 
editors — they produce clean HTML code. Only the HTML 
you type into them will appear in your documents. Word 
processing programs may insert code you did not type, which 
can cause unintended results when the file is opened in a 
browser. 


WYSIWYG programs allow you to actually look at the 
HTML document as you edit it and make changes that you 
can directly see onscreen. Examples of WYSIWYG programs 
include — Microsoft FrontPage, Netscape Composer and 
Macromedia Dreamweaver. WYSIWYG programs have some 
advantages over text-based editors, such as an easier 
learning curve and being able to see your work as you are 
creating; however, their functions sometimes do not display 
correctly on other browsers that are not native to the program 
with which they are affiliated. For example, a web page 
created with Microsoft FrontPage might not display some 
features correctly in Netscape Navigator. This would cause 
the page to display undesired results that would not represent 
what the designer had a mind. In addition, these programs 
sometimes use browser-specific code that is unnecessary 
and difficult to understand or change if you decide you want 
to edit the code by hand at a later date. 


You achieve the best results when you create the HTML 
from scratch. Not only does it give you total control over what 
goe! into your HTML document, but it also provides you with 
valuable skills that are helpful when wrting, editing, or finding 
problems in several lines of code. You have a better sense of 
what makes HTML work and how to create pages that are 
dynamic and visually appealing. 


XML Software : An XML editor works much like an 
HTML editor, and some programs can be used for both 
authoring purposes. Some popular XML editors, such as 


XMetaL (http://www.softquad.com) or XML Spy (http:// 


Digital Preservation 741 


www.xmlspy.com/), assist you through the markup process, 
and show tags in a graphical and hierarchical display. These 
editors cost several hundred dollars, and usually include 
parsing software that validates your markup against the DTD 
you have specified. The online Buyer’s Guide on the XML.com 
website at http://www.xml.com/buyersquide/ has reviews of 
a number of XML editors. Some word processing programs 
claim to offer XML authoring but are not really well suited to 
that purpose. Often they will add extraneous code that you 
do not want cluttering up your documents. 


If you have no money for software, you can write XML 
documents in a plain text editor. NoteTab offers a free version 
of their software from their website http://www.notetab.com. 
For a minimal cost you can upgrade to NoteTab Pro, which 
shows XML tags in a contrasting color to regular text and 
allows you to program frequently used commands. The 
excellent Help files include instructions in writing these 
commands, called Clips, to automatically enter tags, entities, 
or special character codes, and even to launch other 
programs such as your parser. You can also set up templates, 
something we often do for digital projects. While it does involve 
more effort than the out-of-the-box editors, it also offers; some 
amenities at very little cost. 


In addition to an editor, you will need a parser to check 
that your XML is valid, and an XSLT processor to transform 
your XML to HTML. Robin Cover’s XML Cover Pages at http:/ 
/www.oasis-open.org/cover/xmLhtml, describe a number of 
parsers and provide links to their websites. Many of these 
are free. Most of the commercial XML editors have a parsing 
function built in. The XML Cover Pages - an excellent resource 
for all things XML - also discuss conversion tools for 
transforming XML using XSLT. One of the most popular tools 
is Saxon, that is available free from its website at http:// 


saxon. sourceforge.net/. 


742 Manual of Digital Libraries 


Text Editor and/or Word Processor : Text-editing 
programs are sometimes referred to as word processors. 
These programs allow you to create, format, and edit text. 
You can use any word processing program to write HTML. 
The text-editing program works with the OCR software to 
create text files. A word processor is what you need to save 
and edit the text that you will use for marking up your XML 
and HTML documents. Microsoft Notepad is a basic text editor 
that comes with the Windows operating system; however, it 
has limited capabilities. Microsoft Word and Corel 
WordPerfect offer more robust formatting features and 
provide enhanced text-editing tools such as spell-checking. 


Image Editor : Image editing programs are among the 
most valuable programs you can have when producing and 
editing images for archiving and publishing online. Image 
editors allow you to perform many tasks such as saving in 
multiple file formats, resizing images, enhancing photographs, 
cropping, creating images for your website, and so on. This 
is one tool that you cannot be without when starting and 
running a digitization center. When looking for an image editor 
there are a few things to keep in mind — most imaging 
programs are expensive. Adobe Photoshop is fairly 
expensive; however, once you purchase a licence, upgrades 
are less expensive than the retail version. 


Because Adobe Photoshop is such a powerful design 
tool, it requires long hours to learn its many features and 
capabilities. Jasc PaintShop™ Pro is an excellent program 
that performs many of the functions found in Photoshop, but 
PaintShop is a much easier program to use and learn than 
Photoshop, and is much less expensive. PaintShop Pro retails 
for considerably less than Photoshop. A good solution is to 
purchase one licence for Photoshop and use it when you need 
its capabilities and purchase PaintShop Pro for your other 


machines. 


Digital Preservation 743 


Scanning Software : In order to ensure the proper 
operation of your scanner, you must be sure to install the 
software and drivers that are included with your scanner; the 
drivers and software work together to provide seamless and 
optimal operation of your scanner. However, most software 
that accompanies scanners are limited in its capabilities and 
functions, or are trimmed-down version of a more powerful 
program, and therefore additional third-party software is 
needed. 


Because certain file formats are required for archival 
purposes, you must have a program that supports and 
recognizes the specific formats required for digitization 
purposes. When looking for programs to supplement your 
scanner, make sure it is compatible with your operating 
system and the scanner. 


WOCAR 2.5 is a freeware program that scans images 
in the CCITT Group 4 TIFF compression scheme. WOCAR 
automatically defaults to this format, so you never need to 
change any file settings - just scan your documents directly 
into WOCAR to get your archival TIFF images. WOCAR is a 
good program to use, but it is limited in its functionality and 
use, so you should also consider other options if your 
resources allow. One alternative is to use a scanning suite 
that saves in several different file formats other than TIFF, 
and performs a variety of different functions. ScanSoft 
PaperPortis an excellent scanning suite that combines many 
useful functions for use in scanning. 


OCR Software : The OCR program is what you will 
use when you convert your archived images into text with 
your word processor. The OCR program takes the text from 
the archival image and converts it to text in your word 
processor. When it comes to OCR software, you may want 
to make sure you buy a program with a great reputation and 
proven track record. The better the OCR program, the more 


744 Manua! of Digital Libraries 


time and money you will save. In addition, a good OCR 
program makes fewer mistakes and allows you to save time 
when editing your text files. Two widely used and highly 
recognized OCR software programs are OmniPage Pro and 
Prime Recognition. If you are on an extremely limited budget, 
you can use WOCAR as your OCR program; however, it is 
not as accurate and robust as OmniPage Pro or Prime 
Recognition. WOCAR is a freeware program and is available 
for download from the TUCOWS website at htip:// 


tucows.wave.net.br/system/preview/234813.htmt. 


FTP Software : An FTP or File Transfer Protocol 
program is what you will use when you are ready to upload 
your files and projects to the Internet for viewing. The FTP 
client connects to your server and allows you to create 
directories and move your files into the appropriate folder 
where they can be viewed on the Internet. There are several 
free versions of FTP software available for academic 
institutions, students, teachers, and government and non- 
profit organizations. WS_FTP LE is an excellent free FTP 
program. You can download it from http: !lwww.ftpplanet.com/ 
download.htm. 


Page Layout and Design Software : Page layout 
programs are useful for when you want to create publications 
in-house and/or you need to digitize publications that are ina 
page layout format. You can also use these tools to publicize 
your site. Page layout programs offer more design features 
than word processors. In addition, page layout programs are 
helpful if you want to create brochures, leaflets, business 
cards, etc. When using page layout programs you have the 
ability to make publications for your center in-house, and you 
will not have to pay to outsource designing for your publication 

material. A few page layout programs that are available for 
use are : Adobe PageMaker; Adobe InDesign; and 


QuarkXPress. 


Digital Preservation 745 


PDF Software : PDF, or Portable Document Format, is 
a widely-used format that allows people to exchange 
information in instances where the other person might not 
have the proper program to open the file, or when a person 
does not want others to be able to modify and change their 
files. For digitization purposes, the PDF provides a quick and 
easy solution for getting files online for viewing and 
downloading. The PDF is not a replacement for an HTML or 
XML document and should be used as a companion with 
current formats to enhance site accessibility. Adobe Acrobat 
is the leading software program for creating a PDF. Acrobat 
provides many options when creating PDFs, and it works with 
Microsoft Office applications to create a PDF with just the 
click of a button. The attractive aspect of PDF files is that 
anyone with a computer can view them by downloading the 
free Adobe Reader from Adobe at http://www.adobe.com/ 
products/acrobat/readstep2.html. 

In case you are not going to have free software, you 
will need to purchase them. Purchasing software is a relatively 
simple task. Locate retailers that offer the software you need, 
and check their prices against other retailers. Most online 
retailers are competitive and often run specials, so be on the 
lookout when you begin pricing software, and watch current 
trends, too, because a newer version may be coming out 
soon, and it would be a good idea to know release dates for 
new versions. It is also a good idea to check the vendors’ 
websites for possible upgrades or newer versions that could 
be coming out soon. You want to use the latest software 
versions available, and once you purchase a program, you 
benefit from free updates and support, and you are eligible 
for great discounts on upgrades to the latest version without 
having to find the full price each time you want to buy a new 
program for your center. 


When purchasing software for your center or the library, 
be sure it is compatible with your operating system, and that 


746 Manual of Digital Libraries 


your computers have the minimum requirements to run the 
software. All you have to do is look on the box to see what 
platforms it is compatible with and what the minimum 
processor and memory requirements are. You can look at 
the vendor's website for more detailed information about the 
product in general. You can also look at online stores to see 
what the minimum requirements are to operate the software. 


The following list provides a reference to some of the 
software applications available for use in creating your digital 
publications and where you can locate them online: 


e HTML Editing Programs: 
—  Arachnophilia at http://www.arachnoid.com. 


— Macromedia Dreamweaver. at http:// 
Wwww.macromedia .com/software/dreamweaver. 


— Microsoft FrontPage at http://www.microsoft.com/ 
frontpage. 


— Netscape Composer at http://wp.netscape.com/ 
browsers/ using/newusers/composer. 


— NoteTab Pro at http://www.notetab.com. 
e Imaging Programs: 
— Adobe Photoshop at http://www.adobe.com/ 
products/ photoshop. 
— Jasc Paint Shop Pro at http://www.jasc.com. 
e Page Layout Programs: 


— Adobe InDesign at http://www.adobe.com/ 
products/ indesign. 


— Adobe PageMaker at http://www.adobe.com/ 
products/ pagemaker. 


— QuarkXPress at http://www.quark.com. 


Digital Preservation 747 
° PDF programs: 


— Adobe Acrobat at http://www.adobe.com/products/ 
acrobat. 


e Text Editing/Word Processing Programs: 
— Corel WordPerfect at http://www.corel.com. 
— Microsoft Notepad at http:/Awww.microsoft.com. 


— Microsoft Word at http://www. microsoft.com/office/ 
word. 


e OCR Programs: 
— OmniPage Pro at http://www.scansoft.com. 


— Prime Recognition at http://www.primerecognition. 
com/. 


e Scanning Suites: 
— ScanSoft Paper Port at http://www.scansoft.com. 


— WOCAR 2.5 at http://tucows.wave.net.br/system/ 
preview/234813.html. 


e XML Editing Programs: 
— NoteTab Pro at http://www.notetab.com. 
— XmetaL at http://www.softquad.com. 
— XML Spy at http://www.xmlspy.com. 
e XSLT Processors: 
— Saxon at http://saxon.sourceforge. net/. 
e FTP Programs: 
— WS FTP athttp://Awww.ipswitch.com. 
e Freeware and Shareware Programs: 
— _ http://www.download.com. 
—  http://www.tucows.com. 


748 Manual of Digital Libraries 
7.10.2. Hardware 


Acquiring the right hardware is one of the most 
important tasks in building a digitization center. Purchasing 
quality products that are also cost effective might seem like a 
big challenge, but with a little time and research can prove 
rewarding. One of the first decisions in creating the 
infrastructure of your digitization center is which computer 
platform you want to use. Digitization functions perform well 
on the Apple Macintosh, and/or other platforms and operating 
systems; however, PC-and Windows-compatible hardware 
and software are good and easy to use. 


When looking for the right hardware for your needs, 
ask yourself these questions: 


e How much money is allowed for purchases? 
e What equipment is absolutely necessary? 
e Where can the equipment be purchased? 


Knowing how much money is budgeted can greatly 
affect the outcome of purchasing equipment. Make sure you 
only purchase equipment that is absolutely necessary - 
unnecessary extras can waste valuable funds. Knowing what 
companies are reputable and price competitive makes all the 
difference when shopping for hardware. Once you have 
answered these questions, you are ready to take your first 
steps toward making your digitization center a reality. 


7.10.2.1. Purchasing a Computer 


The first thing to know before buying a computer is what 
primary function it needs to perform. Once you have 
determined the need, you can begin customizing your 
machines. There are many companies and options to 
consider when purchasing a computer, and itis generally not 
a good idea to buy a computer that you cannot fully customize. 
When you are able to choose what components and how 


Digital Preservation 749 


much or little of something you need, you know exactly what 
you are getting and what to expect from the components you 
selected. Most computers in retail stores are not customizable 
and are sold pre-packaged. Therefore, if you needed a 
different part or more or less of another part, you would have 
to purchase the extra parts and install them yourself, or 
remove or replace a component that you could have specified 
to already be in the machine. The best idea is to find a 
company or store that allows you to select the components 
you want in your computer and offers different choices on 
each component. Many online retailers offer great computers 
at competitive prices, and you can custom build your computer 
online to suit your specific needs. 


After you determine where you want to purchase your 
computer, you need to choose the model you feel will do the 
best job. The easiest way to determine this is to know exactly 
what components you need, and what systems are currently 
available that use the technology. The following is a list of the 
six main parts you need to be concerned with most — the 
processor; the memory; the hard drive; the monitor; the video 
card; and the optical drive. 


The processoris one of the most important components 
of the computer; it determines how fast it can handle the input 
and send the information back to you. Most computers today 
are built using the Intel Pentium processor; however, the AMD 
Athlon is comparable to the Pentium. Intel also offers the 
Celeron, which is not as robust as a Pentium-class processor. 
Currently, a Pentium 4 processor that runs at 2.0 GHz or 
higher is sufficient to run all digitization functions. 


The more memoryyou have, the better your computer 
performs. It is always a good idea to add extra memory to 
enhance the performance of your computer. The minimum 
amount you should consider is 512 MB, but make sure you 
can add more in the future to keep your equipment running 


750 Manual of Digital Libraries 


smoothly and longer. We strongly recommend 1 or 2 GB, 
however. Instead of buying a faster processor, you might 
consider stepping down a few speeds to add more memory, 
which provides a faster overall performance. 


A large hard disk drive is also important. It helps you 
store all the necessary information from your operating system 
to the files you create for digitization. You should plan on 
having a separate drive dedicated to storing all your files, but 
you should also have a hard drive on your local computer 
that can handle the bulk of the files you are creating, and 
could serve as a backup storage facility if needed. Image 
files are large, and when scanning several images, you have 
the potential of losing valuable hard drive space. Most hard 
drives now are between 80 GB and 320 GB. 


Hard drives have also come down considerably in price, 
so the difference between an 80 GB drive and 160 GB drive 
is minimal. Doubling the size of your hard drive is worth the 
cost of a few extra rupees. Think of it this way — you can 
never have too much hard drive space. 


A nice large screen is preferable when spending long 
hours in front of the computer. Digitization requires long hours 
of looking at the screen, so you should have a monitor that 
makes it as easy on your eyes as possible. The more space 
you have, the more you will be able to see onsc:2en. The 
ideal size is at least an 18" viewable screen, but the resolution 
you set your monitor to can also determine how much you 
can view on the screen. If you have an 18" monitor, but the 
text and images appear too large, you need to adjust the 
resolution in the display properties to acquire the right balance 
of space and comfort to your eyes. The ideal resolution for 
an 18" monitor would be 1,280 X 1,024 pixels. This setting 
allows you to maximize your screen size; however, some 

people might not like the smaller text and images. In this case, 
you would want to set your resolution to a lower setting. 


Digital Preservation 751 


Since you will be looking at mostly text, you need to be 
able to see as much of the text as possible. A larger monitor 
provides this capability. Ideally, you want to have enough 
viewing area on your monitor to see two areas of text side by 
side, so you can use your text editing tools efficiently and 
accurately. 


Choosing the right monitor does not have to be a labour- 
intensive chore, but knowing exactly what you need gives 
you the ability to make a well-informed decision. The following 
is a list of some of the major differences between the newer 
LCD flat panel monitors and the older CRT monitors. There 
are only a few differences, but these differences are worth 
mentioning so you can make an informed decision before 
purchasing: 

— LCD monitors require less energy than CRT monitors. 
— LCD monitors use less space than CRT monitors. 


— LCD monitors generally provide a brighter, sharper 
screen. 


— LCD monitors are free from flicker and easier on the 
eyes than CRT monitors. 


— LCD monitors have fewer problems with glare than CRT 
monitors and can be effectively used in brightly lit areas. 


— LCD monitors are more expensive than CRT monitors. 
— LCD monitors are blurry at lower resolutions. 


— CRT monitors represent colour better than LCD 
monitors. 

— CRT monitors produce better image quality than LCD 
monitors. 

— CRT monitors handle moving images better than LCD 


monitors because of a better response time and are 
better suited for video editing. 


752 Manual of Digital Libraries 


Currently, the computer industry is continuing to 
improve the technology in the LCD monitor, and should have 
the resolution, colour representation, and response time 
issues solved and implemented in the near future. Until then, 
both types of monitors provide a good solution, but you need 
to base your decision on which type of monitor is best suited 
for your center’s specific needs and what you can afford. 


The video card and monitor work seamlessly together, 
but we wanted to separate them into two different categories 
because they are two entirely different components. The video 
card is where you plug your monitor into the computer. Having 
a good card makes all the difference on how well your monitor 
displays the text and images onscreen. The majority of video 
cards work fine with most monitors, but newer LCD flat panel 
monitors with a digital interface use the newer DVI (digital 
video input) port on the video cards. The older CRT monitors 

plugged into the VGA connection on the video card and you 
adjusted your resolution based on the video card’s capabilities 
and your monitor size and preferred resolution. 


At present, there are two types of monitors and 
interfaces for video cards — the older analog CRT monitor, 
which uses the VGA input, and the newer digital LCD flat 
panel display monitor, which uses the VGA input and/or the 
newer DVI to connect the monitor to the video card. Older 
cards have the VGA-only input, but the newer video cards 
have both a VGA and DVI input. Depending on what type of 
monitor you have, it is best to use CRT monitors with the 
VGA input; however, you can use them via the DVI port with 
a converter. LCD monitors mostly connect to the DVI port on 
the computer, but some LCD monitors have both digital and 
analog ports, while some have only digital and others only 

analog. You could use either interface depending on whether 
or not you have a video card that supports either interface or 
only one, and a converter can always be used if needed. 


Digital Preservation 753 


Eventually, all video cards and monitors will use the DVI port 
to connect to each other. 


Optical drives are better known as CD and/or DVD 
drives. These drives have become an increasingly important 
part of the computer, especially with the capabilities of 
recording information to CDs and DVDs for several different 
purposes. 


Most computers today come standard with CD-RW 
drives and/or a DVD-ROM drive. The majority of CD recording 
today is done with the CD-R/RW drive and format. CD-R/RW 
means you can use two types of disks and/or formats in 
recording: a CD-R disk, which can be written to until it is full 
and not recorded to again, and the CD-RW disk, which can 
be recorded and erased and re-recorded to many times. Both 
types of disk hold up to 700 MB per disc. 


When considering what type of optical drive to have for 
your center or library, it is a good idea to have a CD-RW 
drive on all of your machines, and at least one machine should 
have a DVD-R drive. DVD-R is a record-once format that can 
record up to 4.7-16 GB of information per disk, and so holds 
considerably more information than one CD-R/RW disk. The 
only drawback is that, unlike a CD-R to which you can record 
until it is filled, the DVD-R can only be written to once and not 
added to again. The DVD-R drive can read standard DVDs 
and can also be used as a CD-RW drive, too. 


There are currently different DVD recording formats that 
you can choose from such as DVD-RAM, DVD+R, DVD+RW, 
DVD-RW, but they are not universally compatible with home 
DVD players or computer DVD-ROM drives, and only work 
on drives that support their respective format. However, 
DVD+R disks can be read by most DVD-based drives and 
support most DVD-based formats, but can only be recorded 
by a DVD drive that supports DVD+R. So, DVD-R or DVD+R 
are the best formats to choose because of their compatibility 


754 Manual of Digital Libraries 


and capabilities, and are compatible with most home DVD 
players and computer DVD-ROM and DVD-R drives. 


The following is a brief checklist that can aid you in 
selecting the right components for your computer — Pentium— 
4 class processor running at 2.0 GHz or higher speed; 1-2 
GB minimum memory; 80 GB minimum hard drive; 18" or 
larger monitor; 128 MB video card; and CD-RW and/or DVD- 
R optical drive. 


7.10.2.2. Purchasing a Scanner 


The process of acquiring the right scanner can 
sometimes be a little more tedious than finding the right 
computer. Price is a major factor to consider when purchasing 
a scanner, too. There are several different types of scanner 
that you should be familiar with, so you can understand their 
differences and capabilities. The following provides a list of 
different scanner types used for digitization — flatbed scanner; 
overhead scanner; sheet-fed scanner; and film scanner. 


A flatbed scanner is the most appropriate choice for 
use in digitization. They are widely used and versatile, and 
perform all the required tasks you need for digitization 
purposes. Flatbed scanners allow you to place single sheets 
or bound materials face down on the scan bed. Because you 
have the ability to scan bound materials effectively, flatbeds 
are the scanner of choice. In addition, flatbeds produce superb 
color and grayscale scans. 


An overhead scannerthat can produce the same colour 
scans as a flatbed scanner is extremely expensive. A flatbed 
that can scan almost the same amount of area as an overhead 
scanner is much less expensive; however, if you are dealing 
with extremely fragile materials and do not want to damage 
them any further, an overhead scanner provides a great 

solution. If you purchase an overhead scanner for use and 
cannot afford a model that scans in colour, make sure that it 


Digital Preservation 755 


scans in grayscale. An overhead scanner that scans in line 
art or black and white only is not recommended for digitization 
purposes, especially for images. Overhead scanners are also 
large and bulky, and take up a great deal of space. 


A sheet-fed scanner is also not a good choice for 
digitization because you have to slide sheets of paper through 
the scanner, which makes it difficult to scan bound materials. 
Sheet-fed scanners also have the potential of damaging loose 
manuscripts, papers, and photographs, not to mention the 
excess handling involved that has the potential of damaging 
or ruining your materials. 


Finally, f/m scanners are great for photographs, slides 
and negatives; however, film scanners are limited by size, 
and a 5 x 7 photograph is about as large as you can scan. 
You can use a flatbed scanner instead of a film scanner to 
scan archival and online quality images, and you can always 
purchase an adapter for your flatbed if you want to scan 
negatives and slides. 


When discussing scanners, one of the most important 
aspects to consider before purchasing is the actual 
dimensions that the scanner is capable of scanning. This is 
usually referred to as the scan area. Most scan areas are 
determined by inches and/or media sizes such as : 


e 8% 11 (standard letter); 

e 8%~x 14 (legal); 

e 11 x 17 (ledger). 

Most 8% x 11 inches flatbed scanners are reasonably 
priced; however, their size is limited, and if you have 
documents that are larger than the scanner’s bed, you have 
to make use of another scanner. An 8% x 14 inches is more 


of an ideal size for regular scanning. Its larger scan area is 
better equipped to handle larger books and legal size 


756 Manual of Digital Libraries 


documents. The ideal scanner is one that can handle 11 x 17 
inches or larger documents. Its large scanning area permits 
you to scan more than one page at a time and can handle 
books that are larger and uniquely shaped better than a 
smaller scanner. 


A good alternative is to invest in one 11 x17 inches 
scanner and one or more 8% x 14 inches scanners. This way 
you can have at least one large-format scanner that you can 
use for bigger jobs and a couple of smaller scanners to use 
for smaller jobs. There are many 11 x 17 as well as 8% x 14 
scanners that are good choices for use in digitization. 


The first thing to do when searching for the right scanner 
is to make absolutely sure that it is compatible with your 
current operating system, that its drivers work with your 
operating system, and that you are able to download driver 
updates from the vendor’s website. Once you determine your 
need based on the information about your operating system, 
you need to decide which interface you want to use. 
Digitization requires long hours using the scanner, so your 
scanner needs to be fast and reliable. Depending on the make 
and model of your scanner, it should have the ability to use 
the USB port and/or a SCSI connection, or, better yet. You 
should avoid scanners for digitization that use the parallel 
port (printer port) or the older 1.1 USB port. However, the 
newer 2.0 USB ports are much faster and can handle the 
information more quickly. USB 2.0 is an acceptable solution 
if you decide to use the USB port; however, make sure that if 
you connect a scanner to the USB 2.0 port, it says Hi-Speed 
USB 2.0 for the scanner and USB port. 


SCSI scanners were the fastest and most widely used 
until a couple of years ago. Newer scanners and computers 
are phasing out the SCSI interface, which isa good reason 
not to invest in SCSI technology. However, if you currently 
have equipment that is SCSI based and can support the 


Digital Preservation 757 


interface, you may want to use it as long as you can until you 
upgrade to newer equipment. SCSI is still fast and can perform 
well on older computers and operating systems, but the older 
it gets, the less it will be used and supported. 


IEEE 1394 FireWire is the newest and fastest 
technology available for scanning. It is comparable to the 
SCSI port for speed, and installation is easier to configure 
than most other interfaces. Newer scanners have the option 
of using this port. Most computers do not have or come with 
a FireWire card, so if you decide to go this route, be sure 
your computer comes with a FireWire card at the time of 
purchase, or you need to buy one and install it yourself. If 
you are lucky enough to purchase a scanner that includes a 
FireWire card, all you need to do is install it before you hook 
up your scanner. 


Scanner specifications information can sometimes be 
confusing and misleading. When looking at scanner 
specifications there are three important factors to consider: 
scan area; optical resolution (dots per inch); and colour depth 
(bits). 


Scan area is available in 8% x11, 8% x 14 and 11 x 17 
inches size. Scan area is an important part of a scanner’s 
specification. When referring to optical resolution or dpi (dots 
per inch), you need to understand that this does not mean 
you achieve better colour scans at a higher dpi (1,200 X 1,200) 
than a lower dpi (600 X 600). It should be noted that the higher 
the dpi the larger the image and file size. When scanning 
colour images for your online digitization projects, you should 
scan at 300 dpi to keep images the same size as the original. 
You can always scan at 400 dpi or 600 dpi if you would like 
the image to be larger than the original. Black and white 
images are a different issue, so beginning with 400 dpi and 
going as high as 800 dpi is sufficient for online display. 


Besides, the majority of scanners have a color depth 


758 Manual of Digital Libraries 


between 36-bit and 48-bit. This is somewhat misleading, and 
you need to know what this represents. When a scanner 
states it scans in 48-bit colour, it is referring to the internal 
color depth that the scanner is capable of achieving. Most 
programs and video cards are only capable of displaying 24- 
bit colour. This means you need to know how well your colour 
scanner is able to scan and display 24-bit images. However, 
many 48-bit color scanners produce better quality scans than 
scanners that scan at a lower color depth. 


So, it is a good idea to find a scanner that has the 
following attributes: flatbed scanner; IEEE 1394 FireWire port 
or Hi-Speed USB 2.0 connection; an 8% x 14 or 11 x 17 
inches scan area; a 36-bit colour depth or higher; and a 
resolution of 600 X 1200 dpi or higher. 


7.10.2.3. Digital camera 


Investing in a digital camera for your library or center 
can be a valuable asset and, not to mention, a good 
investment. Digital cameras provide several opportunities for 
you to enhance and promote your center. For example, using 
a digital camera is a good way for you to take all the pictures 
of your center, staff, etc., for use on your website, or for 
promotional purposes. You would not have to rely on finding 
images to use, and you could have total creative control over 
the photos posted on your center’s website and other areas. 
However, the most important reason to invest in a digital 
camera for your center is for use on a project that has severely 
damaged materials, or for a project where the materials 
cannot be moved and you must perform the work where the 
materials are housed; it is for these two reasons that a digital 
camera could make a huge difference in your center. 


When considering what type or model to purchase, 
there are a few things you should make sure the digital camera 


can do: 


Digital Preservation 759 


e Make sure there is support for RAW image file format 
and/or TIFF in addition to RAW. 


e Make sure it has decent zoom lens capabilities. 


e Make sure it has macro capabilities or is able to support 
a macro lens attachment for close-up shots. 


When looking at which image file formats a digital 
camera is capable of taking there is one image format that 
you should make sure your camera supports: the RAW image. 
Making sure your camera supports the RAW image format is 
a must because you can convert RAW images into archival 
TIFF images using your computer and certain software 
programs. The TIFF format is nice to have in addition to the 
RAW format because digital cameras can take 8-bit grayscale 
TIFF images. However, colour TIFF capabilities on most 
digital cameras today only go as far as 16-bit colour images. 
You need to have at least 24-bit colour TIFF images for 
archival purposes. So, when looking for a digital camera, 
make sure that it supports the RAW image format, and if it 
has the ability to shoot TIFF images consider it a nice addition. 


A digital camera that supports the TIFF image format 
can usually take 8-bit lossless compression grayscale images 
in the TIFF mode, and the RAW format provides you with a 
way to save RAW images as lossless TIFF images in colour 
or black and white once you download them to your computer. 
If the camera does not take TIFF images but it does take 
RAW images then that is acceptable, but just make sure you 
can purchase Adobe Photoshop for saving your colour TIFF 
images in at least 24-bit colour. The RAW format is excellent 
because it is basically an unprocessed image, or it can be 
viewed as a digital negative. However, the RAW format is 
not universally accepted or a completely defined standard; 
the RAW format differs by manufacturer, i.e. Canon, Nikon, 
Minolta, Sony, etc., and even by camera make and model. 
However, this proprietary format is accompanied by software 


760 Manual of Digital Libraries 


from your digital camera’s manufacturer so you can process 
the image from your computer, and Adobe Photoshop CS 
includes support for working with RAW images. 


RAW and TIFF images are large, so be prepared for a 
longer waiting period while the images are written to the 
camera’s memory card. In addition, make sure you have a 
large memory card and possibly a few extra ones because 
these formats can take up a lot of room quickly. Depending 
on which setting you choose and whether it is colour or black 
and white plays an important part in how large each 
photograph will be. You can count on the image file sizes to 
be around 5-20 MB per image. You can see how quickly your 
memory would get used up, and why the writing times to the 
memory card would be longer. A 512 MB or larger memory 
card for your digital camera should provide you with a good 
starting point, but depending on the project, location and 
computer access, you might want to invest in a card that is 1 
GB or larger, or have multiple memory cards in different 
memory configurations. 


Besides, your digital camera should have a zoom lens 

that is about the equivalent of a standard 35 mm film zoom 
lens, which means it can zoom to 140 mm or longer. This 
helps you zoom in on objects that are farther away when 
taking photographs. Having a macro mode or macro lens 
attachment available for your camera is extremely important 
because it allows you to take extremely close-up shots. This 
comes in very handy when taking archival images of pages 
that you need to OCR. Most digital cameras have the macro 
function built into the camera. Look for a flower icon on the 
camera or in the camera’s menu function. If the camera does 
not support a macro function then a separate lens attachment 
is required. You need to make sure that the camera you 
choose can either support an additional macro lens or a macro 
lens attachment. In addition, mare sure there is a macro lens 


or macro attachment available for your camera. 


Digital Preservation 761 


Digital cameras have come a long way since their 
inception. Prices have come down considerably and image 
quality has ‘drastically improved. You should make it a point 
to research current models and invest in a high-quality digital 
camera for your center. In the event you cannot afford a digital 
camera or one that has the functions and capabilities you 
need, you can always use a film camera or make prints from 
a digital camera and scan them into the computer. This 
method is more time consuming, and could end up adding 
extra costs to a project, but it is another method that could be 
explored if needed. 


When looking at a digital camera for digitization 
purposes, it is a good idea to find a camera that has the 
following attributes — supports RAW image file format; TIFF 
image format is nice to have in addition to RAW but not 
essential; macro lens or macro mode capabilities; third-party 
imaging software such as Adobe Photeshop; a digital zoom 
of the 35 mm film equivalent of 140 mm; and a memory card 
of at least 512 MB or higher. 


7.10.2.4. Other Hardware Considerations 


Although the following components are not required to 
perform digitization tasks, you should take into consideration 
that disasters are possible and you need to protect your 
investment. In addition, you may also need to use a printer 
occasionally. The following list provides a few extra 
components that, you should take into consideration when 
detailing a hardware list — uninterruptible power supply (UPS) 
or battery backup; backup device (tape backup unit, CD-R, 
DVD-R); and printer. 


Uninterruptible power supply is extremely important 
that each computer, scanner, and/or other device is 
connected to a protector. A better alternative to protect is an 
uninterruptible power supply (or battery backup) for your 


762 Manual of Digital Libraries 


machines. Most battery backups include surge protection as 
well as keeping the machine powered on until you can safely 
shut it down. If you cannot afford to purchase uninterruptible 
power supplies for each machine, make sure you have at 
least one for the computer or server that stores all of your 
digitization files. 


Backing up your files is an absolutely necessary 
operation that You must perform routinely in order to have 
your files safely stored in case of disaster. In addition, itis a 
good idea to keep backed up files off-site in case of a fire or 
other disaster. You can use the CD-R and DVD-R media as 
a way to archive your files; this is a good way to back up your 
files intermittently and store off-site but not a good idea to 
use on a daily basis. You would spend far too much money 
on discs and have to manually sit at the computer to copy the 
information. You can also use external hard disk, which are 
comming in the capacity upto 250-320 GB, to get backup 
your data. 


Besides, the printers are an indispensable tool and 
should have a place in your digitization center. The printer 
helps with editing and training purposes, and can be used 
when you find articles from websites and want to use them 
as a resource or reference. In addition, you will probably need 
to write letters occasionally, so having a printer for these 
purposes alone is worth it for your center. When deciding 
what type of printer to use in your digitization center, we 
recommend that you use a laser printer. Ink printers are too 
slow and the ink is expensive. Laser printers are much faster 
and cost less to maintain. 


The following URLs provide excellent information on 
the hardware and on ordering and pricing equipment: 


e Computers: 


— _ http://www.cnet.com 


Digital Preservation 763 


7.11. 


—  http://www.dell.com 

— http://www.gateway.com 
— http:/Awww.pcmag.com 

—  http://www.pcworld. com 
—  http://www.zdnet.com 
Scanners: 

—  http://Awww.epson.com 

— _ http://www.hp.com 

— http://www. microtekusa.com 
— __ http://www.scanstore.com 
—  http://www.scantips.com 
— _ http://www.untax.com 

— http://www. visioneer.com 
Digital cameras: 

— _ http://www.canon.com 

— _ http://www.dcresource.com 
— _ http:/Awww.deviews.cotn 
— _ http://www.dpreview.com 
—  http:/Awww.minolta.com 

— _ http://www.nikon.com 

— _ http://www.olympus.com 
— _ http:/Awww.pentax.com 

— _ http:/Awww.sony.com 


DIGITIZATION PROCESS 
Digitization is a long and complicated process. There 


764 Manual of Digital Libraries 


are many steps involved, as illustrated in the flowchart shown 
in Fig. 7.4. Every project is different, but the four basic stages 
include the following: 


e Stage 7- select material. 

° Stage 2- convert normal text into electronic text. 

e Stage 3- format electronic text for the internet. 

e Stage 4- create website for access and navigation. 


A detailed description of each of the steps is given 
below: 


7.11.1. Selection of Materials 


Materials must go through a selection process in order 
to be considered for digitization. When selecting materials 
for your center it is a good idea to understand that the material 
has to meet certain requirements. To determine eligibility for 
the Library, materials should fulfill the following criteria: 


e It should meet the research needs of faculty, students, 
and scholars within and beyond the library community. 
In assessing what material meets the needs of our 
library, consideration should be given to the scholarly 
content of the material, the uniqueness of the material, 
and the demand for the material. 


e |t should benefit from increased access and should 
contribute to the Library’s service and collection 
development missions. Materials that are difficult to 
access in their original formats or that would benefit 
from increased speed or depth of access via electronic 
delivery formats should be given priority. 

° It should have clear ownership and copyright clearance. 
Before undertaking a digitization project, the Library 
needs to secure sound legal advice about the ownership 
and rights to reproduce or publish materials 


electronically. 


Digital Preservation 765 


Preparation and 
conservation 
of print 
materials 


STAGE 1 


Obtain 
copyright 
clearance 


Selection of 
materials 


OCR 


Optical Scan 
Character documents 


Recognition 


Proofread o 
text to ES 
9 
g8 88% leatas In- touse 
accuracy s 


[STAGE 3 


Design website 
and 
interface 


Present STAGE 4 
finished Prepare 
product stylesheets for 
online XML to HTML 


Fig. 7.5. EPC Digitization Flowchart4 


e _ It should be of interest to potential partners. Materials 
that would be of interest to campus and outside 
partners, both collaborators on the content and potential 
sources of funding and other support, should be given 
strong consideration. 


766 Manual of Digital Libraries 


In addition, before selecting materials, consideration 
for their preservation is made from the following perspectives: 
(a) items should not be digitized whereby the scanning 
process is detrimental to the item itself; (b) items that receive 
heavy patron use and are quickly deteriorating should be 
selected for imaging in order to preserve the original. Although 
data migration is an ongoing concern, original editions will 
not be considered reformatted to preservation quality levels 
by digitization until the technological issues have been 
resolved and appropriate standards are widely accepted. 


The Collection Development Committee should make 
decisions as to which suggested materials will be chosen for 
digitization. In addition, the committee should employ 
established collection development criteria and policies. 
Selection for digitization requires that materials have enduring 
value and be available in a sufficient number or quantity that 
they form a significant and unique research corpus. 


7.11.2. Preservation/Conservation of Originals 


The great care and consideration must be taken when 
preparing original materials for digitization. In addition, you 
must exercise extreme caution when performing the actual 
digitization of the materials. When materials undergo 
digitization, the process could damage the originals; you do 
not, under any circumstances, want to further damage rare 
materials. Digitization is a process of retaining the key 
attributes of the original material, and digitization should in 
no way be seen as a replacement of printed materials. The 
main goal of digitization is to provide greater access to 
materials that are rare and act as a supplement to the original. 


There are some standards which allow for the 
dissemination of knowledge through formats that conform to 
specific guidelines for usage. Standards also provide us with 
the option of how we would like to have our materials displayed 


Digital Preservation 767 


online and what formats are available for us to choose. Tables 
7.3 to 7.5 represent available formats and guidelines that you 
can choose from when planning your projects for archiving 
and online viewing. 


Table 7.3. Image formats 


Filename and ` Extended name Description 

extension 

TIFF tif and .tiff Tagged Image File The CCITT Group 4 
Format compression scheme is 


the standard of choice for 
black and white archival 
images of text - the TIFF is 
also used for archiving 
colour images using a 24- 
bit colour depth in a LZW 
compression scheme, or 
completely 
uncompressed. 


JPEG .jpg and jpeg Joint Photographic The JPEG should only be 
Experts Group used for online and/or 
viewing purposes - it is 
also the most widely used 
and accepted image file on 
the Internet. 


GIF Graphics Interchange GIF files should only be 

.gif Format used for online and/or 
viewing purposes - limited 
to 256 colours. 


After you have completed housing your center and 
setting up the framework of your computer systems, itis time 
to work on the actual structure of your center’s projects. 


768 Manual of Digital Libraries 
Table 7.4. Text formats 


File name and Extended name Description 
extension 
Text .txt Plain Text The plain text file is the 


format of choice for storing 
text-based documents 
using the ASCII standard. 


RTF. rtf Rich Text Format A text file that has 
formatting features much 
like Word and Word 
Perfect, but is not as robust 
as Word or WordPerfect. 


Word Document Microsoft Word Word allows the user to 
.doc Document create and edit text more 
effectively and efficiently 
than a simple text editor - 
useful for creating and 
editing plain text 


documents. 
WordPerfect Corel WordPerfect WordPerfect allows the 
Document Document user to create and edit text 
.wpd more effectively and 


efficiently than a simple 
text editor - useful for 
creating and editing plain 


text documents. 


Because your projects will be viewed online, you must 
develop a cohesive manner in which your site is to be 
displayed. Before any of the physical work begins on your 
projects, you must first define how you are going to set up 
the file structure of each project. Your projects need to reside 
on a file server that your staff can access on a daily basis 
and exchange files and information with. Once you have a 
file server in place, you need to create a folder or folders on 
your server that specifically pertain to each project and/or 
work-related material. If you have one or more projects lined 


Digital Preservation 769 


up, you need to create separate root directory folders for each 
project, and only store information directly related to the 
project inside the folder. 


Table 7.5. Markup Formats 


File name Extended name Description 
and extension 


HTML Hypertext Markup Uses codes and tags to 


-htm and .html Language make web pages, which is 
how the Internet is able to 
function and 
communicate. 

XML Extensible Markup Uses codes and tags to 

xml Language create and archive 


electronic publications - 
also used in electronic 


commerce. 
SGML Standard Generalized The first markup language- 
.sgm and .sgml Markup Language it uses tags to identify a 


document’s structure. 


But the important thing here is that the way you set up 
your directory structure for your HTML files for each project 
is exactly how it will be navigated when it is online. From the 
earliest point in a project's life, it is extremely important that 
you plan carefully when setting up a directory structure. 
Navigation is critical to a project’s success, and spending 
quality time at this point is a necessity. In addition, when you 
do things right the first time, you will not have to go back later 
and correct each document, which takes time and resources, 
and could cause errors or problems if not thoroughly checked. 


To begin, you need to outline exactly what goes into a 
project, and everything about each particular project goes 
into the one folder you created to house that project. So, the 
first thing you need to do is create your root directory or main 
folder that contains all the information on the project. Once 


770 Manual of Digital Libraries 


you have created the root directory, it is time to create 
subfolders within the root directory. At this point, all you need 
to dois click or double click the root directory folder and create 
new folders within that folder. Repeat the above steps each 
time you want to create a new folder or subfolder. When 
beginning a project that involves scanning a book or multiple 
documents, you need to create four separate subfolders in 
that project’s root directory folder. If your project has multiple 
volumes or additions, you can create a subfolder for each 
volume, and then place the four basic folders in each volume’s 
folder, or you can create the four basic folders in the root 
directory and place each volume’s folder within that folder. 
The four folders you need to create are as follows: image_files; 
text_files; xml_files; html_files. 


The use of an underscore (_) is important for 
accessibility and to avoid having an empty space in a file 
name. You should use an underscore any time you need to 
connect ords when naming a file. For example, if you want 
to name a file or folder ‘to do list’, you should use underscores 
where the words break — to_do _list. 


These four folders contain all the information about the 
project. The image_files folder may contain all the archival 
TIFF images that are created when scanning the original 
documents. The text_files folder contains all the text files that 
are OCR’d and proofed against the original documents. The 
xml_files folder contains all the XML files that are marked up 
from the text files, and finally the html_files folder contains all 
the HTML documents that are converted from the XML files 
for online viewing. This structure identifies file types, and helps 
you and your staff locate particular files and keeps them 
separate from files with other extensions. So, text files would 
only go in the text_files folder, and so on. There might be 
times when you need to keep other types of files associated 
with a certain project inside the project's folder. That is fine 


Digital Preservation Zial 


as long as you put them in the root directory or create a 
separate folder in the root directory to keep them separate 
from the actual project files. Naming files and folders is 
another important area of information architecture. 


7.11.2.1. Scanning or rekeying 


Scanning is the first step toward putting a document 
into an electronic format. Through scanning we are able to 
take the printed document and digitize its contents so we can 
work with the text and images to prepare the document for 
presentation and publication on the Internet. Be aware that 
we use the term image to refer to two different items. The 
first is the product of a scanned document. For instance, the 
item you place on the scanner is the printed document, but 
the picture you produce on screen once you have scanned 
the documentis the image. The second item we call an image 
is almost anything within a document that is not text; this term 
includes photographs, drawings, pictures, maps, graphs, 
charts, and so forth. 


Whether we should go for scanning or rekey the 
materials, this is decided on the condition of the materials. 
Extremely fragile materials, anything printed before 1940 
(some material before 1940 is capable of being scanned, but 
it cannot be handwritten and must be in good condition), and 
any manuscripts probably will have to be retyped, because 
the optical character recognition (OCR) software used to 
convert a scanned image to text is sometimes unable to 
recognize the textual characters. Overhead scanning devices 
are less damaging to books than flatbed scanners. In extreme 
cases of fragility, or if extremely frail materials cannot be 
moved to your center for obvious reasons, a digital camera is 
another means of getting material into electronic format. If 
the print is clear enough to OCR, the documents are scanned, 
OCR’d, then saved as text files. Whether scanned and OCR’d 
or rekeyed, all text is proofread. 


2 Manual of Digital Libraries 


Once you have your digitization center’s infrastructure 
in place and a project lined up, you are ready to take the first 
step in creating your online publications. Scanning is the 
process of transferring your printed materials into electronic 
format. Scanning is usually a long, slow process that can take 
weeks or even months to complete depending on the size of 
your project. A large scanner can cut down the amount of 
time needed for scanning material into the computer. 
Scanning pages of text is repetitive and mundane; the less 
time it takes to scan your material, the better. 


There are differences in scanning text for archiving and 
scanning images for display. When scanning material for 
archival purposes there are a few things you need to know. It 
is best to scan text at 400 dpi, in black and white document 
(not photo) settings, and to save the image in the TIFF format 
using the compression scheme CCITT Group 4. The CCITT 
Group 4 TIFF format is best suited for archiving because it 
keeps the image file sizes small. Other formats and 
compression schemes create large file sizes, and this would 
mean you could save less in your storage space. Also, large 
files take time to open and are less stable when larger than a 
few megabytes. In addition, large image file sizes corrupt 
easily and could cause you to lose valuable information by 
not allowing you to open the files you created or causing your 
operating system to crash. Thus try to keep your archived 
imagrs as small as your archival format allows; however, this 
does not imply that you should scan at the minimum 
requirement allowed by your format - you should always scan 
at what is recommend for archiving in your chosen format. 


Scanning images for display differs from scanning for 
archival purposes. When scanning black and white or colour 
photos, make sure you adjust your setting to scan for 
photographs. There are also adjustments between black and 
white and colour photos; choose the gppropriga seo 
the type of photograph you are scanning. Black and white 


—————— 


Digital Preservation 773 


photos can usually be scanned at 300 dpi or 400 dpi, and up 
to 800 dpi depending on how large you want the image to be. 
Black and white photos and text documents should always 
be scanned in 8-bit grayscale. Colour photos should be 
scanned at 300 dpi or 400 dpi. When scanning colour photos 
for archival purposes make sure that your scanner scans in 
at least 24-bit colour and you save the image in TIFF format. 
All black and white and colour TIFF images for archiving 
should always be saved using a lossless compression 
scheme such as LZW or Packbits. CCITT Group 3, CCITT 
Group 4 and Modified Huffman, which are fax compressions, 
are also lossless compressions schemes, but are only used 
for saving black and white text documents. 


If the image was satisfactorily scanned, all you have to 
do now is rename the file using the file naming standard you 
created and save the file to the server or drive that is housing 
your project. Be sure to save the file in its designated folder 
inside its project’s folder. 


However, sometimes the materials you want to digitize 
might not be in good repair, contain handwritten notes. or 
additions, or be completely handwritten. Handwriting does 
not OCR well, and needs to be rekeyed by hand. Rekeying is 
the process of taking a document and physically typing the 
information contained in the document directly into your word 
processor.: Rekeying is extremely involved and time 
consuming. If a document or documents from a project need 
to be rekeyed, then you need to allot extra time for the rekeying 
of the text and proofreading. When rekeying text, you should 
open your word processing program and begin typing the 
text exact/y as it appears on the document. Be sure to 
preserve the structure and content as closely as possible to 
the original. Do not get in a hurry or feel rushed when rekeying 
text; take the necessary time to make sure you do it right as 
you go. Always use the spell-checker in your word processor 


774 Manual of Digital Libraries 


to help ensure fewer mistakes; however, it is not a good idea 
to rely solely on the spell-checker to catch spelling errors 
because it will not catch all misspelled words or certain 
variants of words. For example, some commonly overlooked 
words might include — form and from, you and your, and site 
and sight. These are just a few examples of why it is important 
to proofread the text in addition to using the spell-checker. 
Rekeying is a long, slow process and should only be 
performed when absolutely necessary. If you need to rekey 
text, consider outsourcing the work; there are a number of 
companies that provide this service. Most companies send 
the materials to another country such as India to be rekeyed. 
They find that typists make fewer mistakes when they do not 
speak the language they are rekeying. You can always count 
on rekeying handwritten material, but the use of a digital 
camera might provide a solution or alternative to rekeying 
older or damaged printed materials. 


7.11.2.2. Imaging 


To make an image that is easily viewable and retains 
as much of its original qualities as possible for display on the 
Internet, we usually use Jasc Paint Shop Pro or Adobe 
Photoshop to resize, crop, and/or edit before making the 
image available online. This way we can preserve the integrity 
and clarity of the picture for online viewing. 


Imaging can be one of the most complicated areas in 
digitization, but it is also one of the most rewarding. Presenting 
nice, clear images online enhances the visual appeal of your 
site, and creates interest in the way you present your 
photographs to your online viewing audience. There are 
basically two main areas of imaging for digitization centers: 


— archival imaging; 


— presentation imaging. 


Digital Preservation 775 


Archival imaging consists of scanning your documents 
and photographs into their respective archival image format. 
You can use the CCITT Group 4 compression scheme for 
text-based images and a lossless TIFF compression scheme 
such as LZW for photographic images. Black and white TIFF 
text-based images and photographic images should be 
scanned in 8-bit grayscale. Colour TIFF images should be 
scanned in 24-bit colour. Again, any time you scan for archival 
purposes, be sure to scan using a lossless compression 
mode. Scanning TIFF images completely uncompressed is 
another way to save images for archival purposes, but be 
forewarned that uncompressed TIFF images are extremely 
large. Itis not uncommon for one 24-bit colour uncompressed 
TIFF image to be around 40-100 MB or larger, especially if 
you scan at 400 dpi or higher. Uncompressed TIFF images 
are huge and you need the latest and greatest computer 
loaded with tons of memory to work effectively and efficiently 
with uncompressed TIFF images. 


This process is relatively simple — you can use your 
scanner to scan the documents into the TIFF format through 
the scanning program that allows you to save into the Group 
4 TIFF format. WOCAR and ScanSoft’s PaperPort are two 
programs that allow you to scan into CCITT Group 4 TIFF 
format. Once you have named and saved your image, you 
can use an imaging program to make any necessary 
adjustments. One of the most frequently performed tasks is 
straightening your images. When you scan bound material it 
is often difficult to get it as straight as you want it to be. This 
is where the magic of imaging comes to the rescue. 


Just about every imaging program on the market has 
the functions necessary to straighten or align crooked images. 
This is important for several reasons. First, if you ever decide 
you want to make the actual page images available for your 
patrons to view, you will definitely want to make sure each 


776 Manual of Digital Libraries 


and every image is as straight as it can possibly be. Second, 
when archiving information that is going to be around for future 
generations to see and possibly work with, it is important that 
the work you perform be to the absolute best of your abilities. 
You do not want to leave behind a legacy that shows 
haphazard care or concern. Taking your time at the beginning 
makes all the difference, and it saves you countless hours of 
labor and frustration if you have to go back and fix and/or 
repair any problems, not to mention the fact that you might 
not have the original materials around to make the corrections 
you need or want to make after the project has been 
completed. 


Once you have scanned your images for archival 
purposes, you are ready to enhance or make any necessary 
changes to the images with a photo editing program. You 
can take your images and straighten them if need be, or create 
your derivative JPEG images for web display. Or, if you prefer, 
you can rescan just the photo, chart or other images within 
the text and save them into the JPEG format after you scan 
and resize the image to the size you want to have displayed 
online. Photo editing allows you to adjust and get just the 
right look or format for your center. 


The next is Presentation Imaging, which generally 
involves taking the archival TIFF image and using an image 
software program such as PaintShop Pro or Photoshop to 
edit, resize, and save as a JPEG for online display. In some 
cases, you might want to rescan the images for display-only 
purposes. Sometimes it is easier to just scan an image rather 
than to edit it with software. In some cases you Can use a 
digital camera for presenting your images online. Whichever 
method you choose, there are a few basic steps you should 
know that can assist you in creating nicely detailed images 

for online presentation. There are described below : 


Straightening /mages: Image editing program such as 


Digital Preservation 777 


PaintShop Pro or Photoshop allows you to straighten the 
image to your liking. When using your image editing program 
look for a Rotate or Straighten Image function. The Rotate 
function allows you to specify exactly how much or little you 
want to rotate the image. Simply indicate the direction and 
how many degrees you would like to rotate the image, and 
the image rotates to its new position. If it does not come out 
as straight as you like, keep rotating it until you get it as straight 
as you want. The Straighten Image function straightens the 

image automatically; if you like the result you can save the 

image and use it, or, if you do not like the result, you should 

rotate it manually using the Rofatefunction. Rotating an image 

should be the first step when editing an image. 


Resizing Images: When an image is too large, resizing 
can help you attain the right picture size for your imaging 
purposes. TIFF images are much better suited for resizing 
than JPEG images. The TIFF image contains much more 
information than a JPEG, and the TIFF can be manipulated 
in several different ways while still maintaining great image 
quality. The JPEG is not easily manipulated in some 
instances, and it is better to work with a TIFF when creating 
images for online display. It is also important to note that when 
you are finished editing a TIFF for online viewing, you will 
save it in JPEG format. Thus you are able to keep your original 
TIFF file, make the necessary changes to the image using 
your imaging software, and then save the image as a JPEG. 
Be careful not to save over your TIFF as you are editing it 
because you do not want to make any changes to your 
archival image. It is a good idea to copy the images you are 
editing into a separate folder, so you do not have to worry 
about accidentally saving over your archival image. If you 
are starting from scratch, scan your images in as TIFF files, 


and then begin editing them. 
Resizing images is a fairly simple process. Open the 


778 Manual of Digital Libraries 


TIFF file in your image editing program and select the Resize 
function. There are a few ways you can go about resizing 
your image. The easiest way to resize your image is by 
percentage. For example, if your image is twice as large as 
you want, you can choose to reduce it by 50 percent to get it 
to the required size. Sometimes it takes some tinkering with 
the percentage sizes to get the size you want. You can always 
resize, then resize again. Or, you can choose lower or higher 
percentages until you find one that resizes your images best. 
You can always undo any resizing by clicking the back arrow 
button from the toolbar. This helps you avoid closing and 
reopening images. 


You can also resize by pixel size if you are more 
accustomed to using pixel sizes or know the exact pixel size 
you would like to display your images. Simply enter the’ pixel 
size for the image and click OK. 


Another way you can resize an image is by the actual 
or print size of an image. This method allows you to reproduce 
the image at the exact size that it appears in the original printed 
material or photograph. You can normally choose from inches 
or centimeters. All you need to do is enter the width and height 
of the original and click OK. 


Cropping Images : Cropping an image allows you to 

take out any unnecessary space on the image, and gives 

you the option of displaying only what is needed. Another 

nice feature about cropping is that it also reduces the file size 

of an image when you are trimming unnecessary space that 

is not needed for viewing purposes. When cropping an image, 
click the cropping tool and select the area you want to keep. 
Double click the image and the image is cropped. That is all 
there is to it - cropping is one of the easiest tasks you can 


perform in imaging. 


Saving images : When you are finished resizing and 


cropping your image, and finished making any additional 


Digital Preservation 779 


changes, you are ready to save your image. All you need to 
do here is name it according to the naming structure you set 
up for displaying images online, locate the folder in which 
images for display for this particular project reside, and save 
it in the JPEG format. 


7.11.3. Conversion to Electronic Text 


We use Optical Character Recognition (OCR) software 
to bring up the TIFF image of the scanned document to 
convert a document image into electronic text. Select the 
necessary text portion, and put it into a format where we can 
edit the text for accuracy and usability. 


We can use a technique called zoning to select the text 
then save it as a plain text (.txt) document to edit in a word 
processor such as Microsoft Word, Corel WordPerfect, 
Notepad, or WordPad. Zoning is the act of selecting an area 
or areas of text on an image that you want OCR’d. 


OCR is the act of taking your archival TIFF images and 
converting them into readable and editable text. In the process 
of OCR’ing, you are not changing or altering your image files; 
instead, you are using a program to read the text in an image 
and create text files that are used for long-term storage and 
to mark up your documents for online viewing. OCR’ing an 
image is a fairly simple process. For example, you open your 
OCR program and select the image you want OCR’d. Next, 
you zone the areas of text. Finally, you save the file as a .txt 
file and begin editing and proofreading your file. 


These are the basic steps involved in OCR’ing an 
image. Of course, there are some steps such as naming your 
file and deciding which folder to save it to that you should 
have already outlined in your project's initial setup. It should 
be noted that you can usually select more than one image to 
OCR ata time, but be careful not to do too many images at 
once. OCR’ing several images at a time could cause your 


780 Manual of Digital Libraries 


computer to crash, and that is when you could lose some 
information. You can safely OCR about ten pages at a time. 
If an article is more than ten pages in length, continue to create 
new text files until you are finished OCR’ing the entire article. 
Once you have finished OCR’ing, you can combine the text 
files into one text file by copying and pasting the information 
from the other text files in the order you OCR’d the article. 


Once you have OCR’d a document, you need to 
proofread it, or proof for short. When you procf a document, 
you check each letter, number, word, punctuation mark, 
symbol and so forth to be sure it matches exactly with the 
corresponding character in the printed document. The 
proofing of a document requires concentration and patience 
to check each and every paragraph, sentence, word, and 
punctuation mark to ensure they exactly match their 
counterpart in the printed document. The best way to proof a 
document that is electronic is to have the original directly in 
front of you or on a bookstand. The book or document should 
be in close enough proximity that you can easily read the text 
and compare it to the text on the screen. After you have 
opened the text file or files that you are proofing, increase 
the font size of the text to 12 point or larger. This allows for 
the letters, numbers and characters to display larger onscreen 
so your eyes are not strained as much as if you were looking 
at smaller type. The next step is only available if you use a 
high-end word processor such as Microsoft Word or Corel 
WordPerfect. Notepad and WordPad do not have the following 
capability. 


If you do not want to adjust the point size of the font, 
you can always increase the percentage size of the document 
as it is displayed. This way prevents you from having to 
globally change the font size of the document. The following 
steps can increase the viewing area of your document in 
Word. From the menu bar, select View, then Zoom. The Zoom 

dialog box opens and allows you to increase or decrease the 


Digital Preservation | 781 
viewing area of the document 


Another way to use this function without using the menu 
is to look at the toolbar and find the drop-down menu that has 
a number with a percent sign. You can pull down the menu 
and select a larger or smaller percentage to adjust the viewing 
screen. Or, you can click once inside the pull-down menu 
and enter a custom percentage to adjust the viewing area. If 
the toolbar is not displayed, you can click Viewfrom the menu 
bar, go to 7oo/bars, and make sure the Standard option is 
checked. 


Proofreading is perhaps the most mission-critical step 
of the entire digitization process. It is of the utmost importance 
that the online version of the text maintains and fully 
represents the original document. We cannot stress enough 
how important this step is to the quality of a project. Many - 
people, such as scholars, students, and historians, will cite, 
refer interested parties to, and use your project as an absolute 
reproduction of the original material. If there are flaws in a 
project, most notably spelling, or worse, any omission of words 
or paragraphs, people will notice and the word will quickly 
spread that your site cannot be considered a trusted source 
for viable information. When the integrity of your operation 
and the credibility of the information produced by your staff 
are questioned by others, it is detrimental for all involved. If 
nobody trusts your practices and content, then your center 
will face problems in many areas including ridicule from faculty 
at your own and other institutions, spiteful letters or email 
from patrons, harsh criticism from your superiors, and getting 
turned down for grants and/or funding. 


Take steps to ensure that your digitization efforts are 
not in vain. There are several ways you can make sure your 
material is as accurate as possible: 


= Use a good OCR program such as OmniPage Pro or 
Prime Recognition to achieve the best possible 


782 Manual of Digital Libraries 


conversion from the beginning. The fewer the errors at 
this point, the better. This is not the place to try to save 
money; spend as much money as needed on the best 
OCR software you can afford. 


— Develop a workflow process that guarantees that the 
files will be proofread at least twice by two different 
people. 


— When you begin marking up the proofed text files in 
XML, it is a good idea to have the original document in 
front of you so you can preserve the structure of the 
original; for example, maintain headings, paragraphs, 
punctuation, italicized words, etc., and proofread the 
document or article a third time as you are marking it 
up in order to catch any mistakes or omissions that 
might have made it past the first two people, because 
this is the last stage of editing before the document 
goes online. 


— After you have made the XML to HTML conversion, 
check the HTML file in your browser to make sure 
everything such as headings, paragraphs, footnotes, 
photographs, links, and so on are working properly - it 
is also a good idea to glance over the text and spot 
check spelling, and check for special characters that 
are not displaying properly or are missing. 


By following these few steps, you can ensure that your 
digitization efforts are exacting and meet the 99.95 percent 
accuracy goal. Do not rush the proofreading process; it is a 
long, slow process that takes time to finish, but the rewards 
are well worth the time spent making sure your projects are 
an accurate representation of the original document. 


XML markup — XML, or extensible Markup Language, 


is an essential part of the digitization of text. A W3C 
recommendation, XML marks up the structure of the original 


Digital Preservation 783 


document, and provides an application-, platform-, and 
vendor-independent format that will be usable far into the 
future. 


The original markup language and the ancestor of 
HTML and XML is SGML, or Standard Generalized Markup 
Language. SGML is an international standard, ISO 8879, and 
has been around for more than a decade. It is structural 
markup, which means that it marks the structural elements 
of a document, e.g. heading, abstract, chapter, section, 
subsection, and paragraph. It has long been used by industry 
and the military to mark up manuals for training and 
operations. SGML uses a DTD, or Document Type Definition, 
to set the rules for markup within a set of documents. The 
DTD specifies what tags may be used, and which tags may 
be used inside other tags. There are different types of DTDs 
for different classes of documents, and anyone can write a 
DTD to use for their document set. An SGML document is 
useless without its DTD, because no one will know what the 
tags signify. An advantage of SGML is its flexibility; you can 
define the tags in your DTD to mark whatever aspects of the 
text are important for your purposes. The main drawback to 
SGML is its complexity. It is very difficult to learn and to use. 
It cannot be displayed in a web browser. Its flexibility also 
acts as a disadvantage, because it makes it very difficult for 
programmers to write software for SGML. The SGML software 
that is available is more expensive than most libraries can 
afford. 


HTML, or Hypertext Markup Language, is the one with 
which most people are familiar. This is the language of the 
Web. It is a limited tag set of SGML, and it is only concerned 
with marking text up for layout and display. It does not 
preserve the structure of the document as SGML or XML do. 
HTML is flexible, forgiving and simple to use, which makes it 
easy for anyone to create files for the Web; however, these 


784 Manual of Digital Libraries 


same attributes work against it as an archival format. To 
preserve the integrity of an electronic text over time, the 
markup language should tell us something about the structure 
of the document, and should adhere to certain grammatical 
standards. This is where XML comes in. 


Panel 7.1 describes cascading style sheets (CSS) and 
extensible style language (XSL), which are the methods of 
providing style sheets for HTML and XML. 


Panel 7.1. Cascading Style Sheets and 
Extensible Style Language 


Markup languages describe the structural elements 
of a document. Style sheets specify how the elements 
appear when rendered on a computer display or 
printed on paper. Cascading Style Sheets (CSS) were 
developed for use with HTML markup. The Extensible 
Style Language (XSL) is an extension of CSS for XML 
markup. Much as XML is a simplified version of full 
SGML that provides a simple conversion from HTML, 
XSL is derived from Document Style Semantics and 
Specification Language (DSSSL), and any CSS style 
sheet can be converted to XSL by a purely mechanical 
process. The original hope was that XSL would be a 
subset of DSSSL, but there are divergences. Currently 
XSL is only a specification, but there is every hope 
that, as XML is widely adopted, XSL will become 
equally important. 


In CSS, a rule defines styles to be applied to selected 
elements of a document. For example, the simple rule 


h1 {color: blue} states that for elements tagged h1 (the 
HTML tag for top-level headings) the property colour 
should have the value blue. More formally, each rule 
consists of a se/ector, which selects certain elements 
of the document, and a declaration (enclosed in 
braces, and having two parts, separated by a colon: a 
properly and a value), which states the style to be 


Digital Preservation 785 
applied to the elements. 


A CSS style sheet is a list of rules. Various conventions 
are provided to simplify the writing of rules. For 
example, 


h1, h2 {font-family: sans-serif; color: blue} specifies 
that headings h1 and h2 are to be displayed in blue in 
a sans-serif font. 


The markup for an HTML document defines a structure 
that can be represented as a hierarchy. Thus, 
headings, paragraphs, and lists are elements of the 
HTML body; list items are elements within lists; and 
lists can be nested within other lists. The rules in CSS 
style sheets also inherit styles from other rules. If no 
rule explicitly selects an element, it inherits the rules 
for the elements higher in the hierarchy. For example, 
consider the following pair of rules. 


body (font-family: serif} 
h1, h2 {font-family: sans-serif} 


Headings h1 and h2 are elements of the HTML body 
but have an explicit rule: they will be displayed in a 
sans-serif font. Because in this example there is no 
explicit rule for paragraphs or lists, they inherit the 
styles that apply to body, which is higher up the 
hierarchy; thus, they will be displayed in a serif font. 


A style sheet must be associated with an HTML page. 
The designer has several options, including 
embedding the style at the head of a page and 
providing a link to an external file that contains the 
style sheet. Every browser has its own implicit style 
sheet, which may be modified by the user. A user may 
have a private style sheet. 


Since several style sheets may apply to the same 
page, conflicts can occur if rules conflict. Mechanisms 
based on simple principles have been developed to 
handle these situations. The most fundamental 


786 


Manual of Digital Libraries 


principle is that when rules conflict, one Is selected 
and the others are ignored. Rules that explicitly select 
elements have priority over inherited rules. The most 
controversial convention is that when the designer’s 
rule conflicts directly with the user’s, the designer’s 
rule has precedence. A user who wishes to override 
this rule can flag a rule “important.” Though this is 
awkward, it does permit special style sheets to be 
developed—e.g., for users with poor eyesight who wish 


to specify large type. 
XML is also a child of SGML, but unlike its parent 


language, it has strict syntax rules. All XML documents must 
meet the following requirements: 


It must have a root element. This is an outer wrapper 
tag like the <html></htmI> tags that enclose an HTML 
document. 


Every tag must have a closing tag. S@ML and HTML 
allow you to use an opening tag only, e.g. <p> for a 
paragraph. In XML the opening tag must have a 
matching closing tag. The <p> would require a </p>. 
This makes it easier to write software that acts on XML 
documents, because you can use the opening tag to 
tell the program to begin a certain function and the 
closing tag to tell it to cease that function. 


Tags must nest cleanly — <pxb>hi there</bx/p> rather 
than <pxb>hi there </p></b>. Tags must close in the 
reverse order of that in which they opened. 


Empty tags like <br> still exist, but take a different form, 
<br/>, to indicate that there is no closing tag. 


All attributes must be in quotation marks: <hi rend= 
“italics”’></hi>, not <hi rend=italics></hi>. 


Tags are case sensitive and must match exactly. The 
tag <bibl> could not close with the tag <Bibl>. The two 


Digital Preservation 787 


tags could exist within the same document and mean 
entirely different things. The difference in case is 
enough to distinguish the tags. This is one of the 
toughest adjustments for those used to working in 
HTML, which is forgiving of tag case. 


e You must declare at the top of the document that it is 
an XML document. The basic declaration is <?xml 
version= “1.0”?>. 


e An XML document may or may not have a DTD. If you 
do use a DTD, you must declare it at the top of the 
document in the format: <!DOCTYPE TEI.2 SYSTEM 
‘teixlite.dtd’>. This declaration states the document type 
name, the location (in this example, on the host system), 
and the file name of the DTD. 


An XML document must be well-formed, which means 
that it conforms to the requirements set out above. It may 
also be valid, which means that in addition to being a well- 
formed document, it conforms to the specifications of a 
Document Type Definition (DTD). To ensure that it conforms 
to the DTD, the XML document is validated against the DTD 
by a piece of software called a parser. The parser goes 
through the XML document, checks it against the DTD, and 
makes sure that all elements required by the DTD are present 
and that the tags are used appropriately. Common parsers 
include the programs SP and xmlint, which are available online 
as freeware. Internet Explorer can be used to check that XML 
files are well-formed. 


XML in its raw form does not display on the Web. You 
must use a style sheet to render the XML document for 
browser viewing. IE 6.0 and higher displays XML files using 
cascading style sheets (CSS) like those used with HTML files. 
To be sure that your XML documents are viewable by all 
browsers, however, you must transform them into HTML using 
XSLT — Extensible Stylesheet Language Transformations. 


788 Manual of Digital Libraries 


XSLT is really a program written in XML that transforms the 
XML tags into specified HTML tags with the help of an XSLT 
processor such as Saxon. 


There are two XML DTDs that are commonly used by 
libraries — the Encoded Archival Description (EAD) DTD and 
the Text Encoding Initiative (TEI) DTD. FAD documents have 
two major parts — the EAD header containing metadata about 
the finding aid and the <archdesc> element containing the 
encoded text of the finding aid itself. The header has four 
elements — <eadid> - a unique identifier for the file; 
<filedesc>- the file description, with bibliographic information 
about the finding aid; <profiledesc> - the profile description, 
with information about where and when the file was encoded 
and by whom; and <revisiondesc> - information about any 

revisions to the EAD file. 


The <orchdesc> contains <did> elements (description 
identifiers) that contain sub-elements describing the 
containers and contents of the archival collection. 


The 7ext Encoding Initiative (TEI) Guidelines are used 
for creating full-text resources. TEI DTD preserves the 
structure of the text being encoded, be it novel, letter, poem, 
drama, diary, or journal article. It also allows encoding of 
names, places, dates, <keywords>, and linguistic elements. 
Most users find that a subset of the full TEI DTD, the TEI Lite, 
will meet their needs. Like the EAD, TE! documents have 
two basic parts — the TEI header containing metadata about 
the document, and the <text> element, subdivided into 
<front>, <body> and <back> elements, containing the 
encoded text of the document. The <body> is divided into 
<div> elements that may be defined to describe the structure 
of the document, whether they are chapters, sections, 
stanzas, or acts. The top-level <div> elements may contain 
iti i hich may be numbered, 
additional levels of <div> elements, whic Vibe a, Wil 
e.g. <div1> <div2> <div3x/div3> </div2> </div1>. le 


Digital Preservation 789 


numbering of <div>s is not required, it can be very useful in 
keeping track of the structure of the document and avoiding 
encoding errors. 


Then the role of metadata comes, which has already 
been discussed in Chapter 6, but to make continuity and the 
process understable, metadata are briefly discussed below : 


Metadata is simply data about data. The most familiar 
examples are catalog or MARC records. In digital collections, 
there are a number of types of metadata. Some files, including 
EAD, TEI, and TIFF files, contain metadata in a header that 
is part of the file. Other metadata schemes hold information 
about a file independently of the file itself. Two of the most 
commonly used are — the Dublin Core and the Metadata 
Encoding and Transmission Standard (METS). 


The Dublin Core was developed with input from a 
number of library organizations. Meetings to work out the 
Dublin Core Metadata Element Set were held at OCLC’s 
headquarters in Dublin, Ohio, giving the metadata scheme 
its name. There are 15 elements defined in the Dublin Core, 
but only some are required. One aim of the Dublin Core 
development group was to come up with a simple, broadly 
applicable metadata scheme that could be implemented by 
organizations and projects of all sizes. There is a Dublin Core 
XML DTD, but Dublin core elements can also be added to 
HTML files in the <meta> tag or stored in a database. The 
Dublin Core has gained wide acceptance, largely because of 
its simplicity, flexibility, and applicability to materials in any 
format. Open Archives Initiative, which endeavours to build a 
union catalogue of online collections by harvesting metadata 
from the collections, requires Dublin Core metadata. 


METS is an another XML-based standard for metadata. 
It does not seek to replace other metadata schemes, such 
as Dublin Core or the TEI header, but rather to provide a 
structure in which to collect and reference them in an XML 


790 Manual of Digital Libraries 


document. It provides for the inclusion of information about 
various formats and representations of digital objects, e.g. 
an audio file and a TEl-encoded transcript of it, and for 
administrative metadata such as licensing and rights 
information. 


METS is new and not yet widely implemented, but will 
likely be an important metadata format for digital libraries. 


Next come Databases, which can be the best way to 
manage the digital objects in your collection. They can hold 
the metadata, image files, digital audio/video, text files, and 
web pages. Many websites are database driven. Databases 
can be searched by field or keyword, making it easy for users 
to find the information they need. The best way to get a 
database that meets your particular needs is to design one 
using SQL or Oracle. This, however, requires considerable 
expertize and usually involves hiring an experienced 
programmer/database designer. Most institutions will find it 
easier and more cost effective to go with an out-of-the box 
database solution. 


There are a number of powerful digital library database 
systems produced by well-known library software vendors. 
Most of these are XML-based, which is good news for 
migration and long-term preservation. Most are also quite 
expensive, although there is usually scaled pricing based on 
the size of the institution, and it may be possible to join 
together with other institutions to obtain consortium pricing. 
You may also decide to go with a common database package 
like Microsoft Access or FileMaker Pro. These may be 
adequate for the internal management of small collections, 
but would not be appropriate for online digital library 
management. 


7.11.4. Naming and Saving Files 


Properly naming of the files is of great importance in 


a 


Digital Preservation 791 


digitization for many reasons. First, when choosing a naming 
convention for your files, you are setting up how each 
particular file will be identified, usually by volume and page 
number. Second, the naming convention applies to all files 
created from the original TIFF image, such as the text file, 
XML file, and HTML file. In addition, your particular naming 
convention can also apply to photographs by simply adding 
photo to the end of the file name. Finally, it is important to 
understand that the naming convention also outlines the 
navigation for the files you present online. 


Developing a first-rate file naming convention is an 
essential part in making your project work efficiently and 
correctly. The following is an example of how we can use the 
file naming convention for a set of books: 


— Begin the name with a V or v - this represents volume. 


— Use the actual volume number of the set you are naming 
- i.e. 2, 02, 002, or 0002, depending on how many 
volumes are in the collection you are working with. 

— UseP orp to represent page. 


— Follow with the actual page number of the document - 
i.e. 1,01, 001, or 0001, depending on how many pages 
are in the book. 


The following is an example of what this naming 
convention would look like in the following areas: 


e a book with 99 or fewer pages: vip2 or v1p02; 


e a book with 999 or fewer pages: v2p009; v2p025; 
v2p500; 

e a book with 1,000 or more pages: v5p0001; v5p0010; 
v5p0975;v5pl250. 


But remember, when you are using a file-naming 
convention that should not be begin with a ‘0’, otherwise the 


792 Manual of Digital Libraries 


order of the files in your folder will not be sequential. For 
example, all pages that begin with a ‘1’ are first in order, then 
2, then 3, and so on. For instance, pages 1, 11-19, 100, 111- 
119, and so on come before any pages that begin with a 2, 3, 
4, etc. So, pages 1 11-19, 100, 111-119 and.1000 would come 
before page 2. When you try to arrange your files they will 
not arrange in sequential order. If you want to avoid that issue, 
all you have to do is place a ‘0’ as the first number in the 
naming convention. Be sure that you place the appropriate 
number of zeros in front of the first number if your collection 
is more than 100 or 1,000 pages. This helps to keep all files 
in proper sequence. 


You can see how you can structure a book or a set of 
books with this naming convention. It is important to be 
consistent in naming the files of each project the same way. 
For example, if you begin naming a volume with a capital V 
rather than a lower case v, be sure you begin each file name 
with a capital V throughout the entire project - do not ever 
switch between upper and lower case when naming a project's 
files. Think of the future — your current operating environment 
may not be case sensitive, but future operating systems on 
which you load your file may be. Be consistent to ensure your 
projects have as few problems as possible. 


Once you have created your file naming convention, it 

is a good idea for you to document and explain the steps 

involved. Putting the instructions in a text document is a good 

idea for training purposes, but you should also include it in 

the header of the XML document. This is important for many 
reasons. First, if for any reason a text document of the file- 
naming structure is lost, you can easily find it in each XML 
document. Second, if the staff that originally created the 
naming convention are no longer employed at your institution 
and you cannot reach them to ask questions, the naming 


convention information is contained in the XML document’s 


Digital Preservation 793 


TEI! header and can be retrieved, so you can see how the 
files are named. Finally, it is important to keep the file naming 
structure the same and not change it at any point during the 
digitization process. 


Thus, digitization process involves several steps and 
many different file types are created in the process. It is easy 
to lose track of your progress and where you and your 
employees are at on a particular project if you do not have 
some kind of record of your progress. There might be times 
when you will be working on multiple projects; thus you are 
constantly working in several different folders, using different 
naming conventions, and working in different areas such as 
XML markup on one project and OCR’ing and proofing for 
another project. This could become confusing and frustrating, 
and take a lot of time out of your schedule searching for and 
finding files. It also takes time away from your schedule if 
you are constantly helping your employees find files and figure 
out what part of a project they should be working on, and it 
takes much needed project time and resources away from 
the actual project. That is why it is extremely important to 
develop a good workflow process, and document the progress 
by having a workflow log for people to use for each phase of 
a project. The digitization process normally follows these 


steps: 
— Scanning, saving, and naming TIFF image files. 
— OCR'ing the TIFF images and naming and saving as a 
text file. 
— Proofreading the text file for accuracy. 
— Marking up in XML. 
Table 7.6 shows an example of a workflow log created 
for these steps. 


y 


794 


Tiff file range OCR’d and New file name Proofed against XML markup 

scanned saved as .txt original 

TIFF pages Initials/Date Combined all _Initials/Date Initials/Date 

scanned and OCR'd pages 

named accor- from TIFF 

ding to file images and 

naming named for first 

convention page in the 

series 

Pages 01-10 DM-11-22-08 007p001 DM-11-22-08 CA-12-01-08 
Pages 11-21 DM-11-22-08 ” DM-11-22-08 CA-12-01-08 
Pages 22-32 DM-11-22-08 007p022 DM-11-23-08 DM-12-02-08 
Pages 33-40 DM-11-22-08 ” DM-11-23-08 CA-12-02-08 
Pages 41-56 DM-11-22-08 \007p041 DM-11-23-08 CA-12-02-08 
Pages 57-65 DM-11-22-08 007p057 DM-11-24-08 CA-12-03-08 
Pages 66-72 DM-11-22-08 ” DM-11-24-08 CA-12-03-08 
Pages 73-85 DM-11-22-08 \007p073 DM-11-24-08 CA-12-03-08 
Pages 86-90 DM-11-22-08 \007p086 DM-11-24-08 CA-12-04-08 
Pages 91-101 DM-11-22-08 v007p091 DM-11-25-08 CA-12-05-08 
Pages 102-12 DM-11-22-08 V007pl02 DM-11-25-08 CA-12-05-08 


Table 7.6. Sample workflow log 


Manual of Digital Libraries 


Establishing good workflow procedures can make a 
huge difference in your center’s productivity and operations. 
The best way to begin establishing a workflow process is to 
first define what needs to be done. In our case, we need to 
scan pages and save them as TIFF images. This should be 
the first step, and all pages should be scanned before starting 
anothei step. When all the pages are scanned, it is time to 
begin OCR’ing the images into text files. Once all pages have 
been OCR’d and combined into text files, it is time to begin 
proofreading and editing the text files for accuracy. Once that 

step is completed the files are ready for XML markup. 


It is important for you and your staff to keep your 
workflow consistent and maintain the correct flow of 
operations. You should never skip around and do a step here 


Digital Preservation 795 


or a step there. It is best to have everyone working on the 
same part of a project at the same time until that part is 
completed in order to avoid confusion. It also makes things a 
lot easier when answering questions and keeps everybody 
on the same page, so to speak. 


7.11.5. Online Putting of Files 


Once you have created your site, uploading the files to 
your web server is a relatively simple task. To make sure 
everything is ready for online viewing, it is a good idea to 
check and see if all your links are linking to the proper places, 
that all your images and photographs are displaying properly, 
and that the project is completely finished. All you need to do 
is open the files in your browser and begin clicking on all 
links, footnotes, navigation bars, etc. Also, be sure and check 
to see if all images and photographs are displaying correctly. 
Once you have thoroughly checked your site for accuracy, it 
is time to upload your project to your web server. 


When you are ready to upload your files, the first thing 
you need to do is make sure you have an FTP program. FTP 
stands for File Transfer Protocol. This is the act of transferring 
a file or files from your computer or server to other locations- 
in this case, from your computer to the web server that is 
hosting your website. 


When using your FTP client software to upload the files 
to the web server the first thing you need to do is create the 
root directory or folder that the entire contents of the project 
will reside in on the web server. WS_FTP program is used as 
example. Once you have successfully opened your FTP 
program and logged into your web server, you are ready to 
begin putting your project online. The following is an example 
of how to use the FTP program when putting files online: 


—  Tocreate a new directory on your web server, click the 
MkDir button located on the right side of the Remote 


796 Manual of Digital Libraries 


System dialog box, or right click inside the window and 
select Make Directory from the drop-down menu. 


— Type in the name of the folder you want to create and 
click OK. The folder now appears on your web server. 


— Double click the new folder you created to open the 
directory. 


— From the Loca/ System, double click the appropriate 
project and its folders until you come to the folder that 
contains all the HTML files. 


— Double click to open the folder that contains all the 
HTML files. 


— Click the first folder or file in the directory - it will turn 
blue. 


— Press and hold the SA/ftkey and begin selecting all the 
files in the folder by clicking them with your mouse until 
all files are selected. 


— Press the right arrow in the middle of the program to 
transfer all files to the selected folder on the web server; 
the files will transfer, and if you have speakers available 
and turned on, you will hear a video game type noise 
when all files have successfully transferred. 


— Click C/ose from the bottom right-hand side of the 
window to close the program. 


Now you can go online to the URL of your project's site 
and see it online, and check it to make sure you have 
transferred everything correctly. At this point, it is also a good 
idea to check all links, images, and navigation online as one 
last measure of quality assurance. 


In this way, the process just starting from creation of 
digitization of matter upto uploading of them on Internet for 
wider access is completed. 


Digital Preservation 797 
7.12. DIGITAL ARCHAEOLOGY 


Digital archaeology, often referred to as data recovery, 
denotes a set of techniques that are applied as a last resort, 
and is therefore not a solution in any sense. The term applies 
to ‘methods and procedures to rescue content from damaged 
media or from obsolete or damaged hardware and software 
environments’ and involves ‘specialized techniques to recover 
bitstreams from media that has been rendered unreadable, 
either due to physical damage or hardware failure’ and is 
‘explicitly an emergency recovery strategy’. Special facilities, 
equipment and expertize are required and it is usually carried 
out by specialized data recovery companies whose expertize 
has established that it is possible to recover data from a wide 
range of media types. The recovery of the data does not 
necessarily lead to recovery of the ability to understand that 
data, although it is a necessary pre-condition for it. 


The UNESCO Guidelines caution us against relying on 
digital archaeology and remind us that it is ‘a very unreliable 
and high-risk substitute’ for a current and active preservation 
programme. For one thing, it is very expensive and most 
materials would not justify the costs of recovering them. But 
more important, it is not reliable; there is no guarantee that 
digital materials can be recovered, and, even if the data is 
recovered, there is no certainty that it will be intelligible. 


Ross and Gow’s study Digital Archaeology: Rescuing 
Neglected and Damaged Data Resources suggests that data 
recovery should be unnecessary if ‘good disaster planning’ 
is in place, but this very rarely is the case. They note that 
‘with sufficient resources much material that most of us would 
expect to be lost can be recovered’. So we should take care 
of every loss of stored information but if it happens, we must 
be ready for recovery of them. 


Summing up, it can be said that the libraries and archival 
institutions have a unique opportunity in the area of digital 


798 Manual of Digital Libraries 


preservation. From the viewpoint of communication, the digital 
repository must provide feedback about its policies and 
technologies, and be open to sharing this information with 
users. It is critical for the digital repository to be able to present 
the archival policy to a potential depositor or to explain in 
non-technological vocabulary how the digital objects are 
protected from fraudulent or inadvertent changes. As libraries 
and other institutions embark on the digital preservation 
process, judgment must be used to balance risk against the 
maturity of the process. Documents that are extremely rare 
or whose loss might cause considerable financial, 
environmental, or cultural disasters should not be entrusted 
to a relatively immature process. We would like to say that 
we will preserve our cultural heritage materials in perpetuity; 
however, the unknown—and furthermore, unknowable digital 
landscape suggests that any such guarantee would be 
inadvisable at this point. 


Indeed there are many challenges; nonetheless, the 
libraries must begin to assemble and integrate the policies, 
standards, methods and technologies for doing digital 
preservation. There is much research yet to be done. For 
example, we can not yet easily discern if two digital objects 
are equivalent in structure and semantics, libraries and 
archives will have to deal with these types of uncertainties in 
addition to such active management issues as migration of 
data, an area that is largely indeterminate given the rapid 
evolution of technology. Although there certainly are risks in 
undertaking digital preservation, libraries must begin to 
establish their reputations simply by deciding to get started 
in this new role. 


ASG temea 


Àv 


mega aet fasana, etter 
at don OBUANS ~ anna wen. 185073 


=Y 
Gea faan at fafa are sifea 21 sa fafa afa 
30a faa ae Gath yaa À arte att art ETI 
aaa 50 ÙA ufafes & feara À facta veh or 


185073 


NN 


r —————————rl O.. O 


de ee 


as 


ABOUT THE AUTHORS 


Dr. Anil Kumar Dhiman holds M.A., 
M.Sc., MLISc., B.Ed., PGDCA and Ph.D. 
Degree in Botany and Ph.D. Degree in 
Library & Information Science. He is 
Fellow and Life members of various 
professional associations and has more than 100 
Papers and 27 books to his credit in both field of his 
study, i.e., in Library & Information Science and 
Botany. He has more than two-decade working 
experience in the library profession and presently is 
Information Scientist at Gurukul Kangri University, 
Hardwar (India). He had taught Library Science to the 
students of Library Science at Library Science 
Training Cetnre, Roorkee for about a decade and also 
had been the Counsellor of Library Science at 
Haridwar Centre of UPRTOU for two years during 
2000-2002. He has also guided 22 candidates for their 
M.Phil. dissertations in library science in various 
universities. He is recipient of the APSI Young 
Scientist Award and Gold Medal in 1999, USSHLE- 
IJILIS-PBDBBS-2003 Award in Library & Information 
Science, and Glory of India International Award and 
Gold Medal in 2006. 


Smt. Yashoda Rani holds M.A., M.Lib 
Sc., M.Phil (Library Science) and DNYS 
(Diploma in Naturopathy and Yogic 
Science). She has co-authored nine 
haces p books with Dr. Dhiman and also has 
contributed a few papers in journals and seminar 


proceedings. 


-SOME TITLES OF INTEREST- 


` Ranganathanism and Khowledge Society - 
N3 R: weatyanarayana eee 


s Digital T and Digitization 
~ g Gurdev Singh 
Scientometric Indicators and Webometrics- i 
And the Polyrepresentation Principle in Information Retrieval 
_ 4 Peter Ingwersen 
eee 
RFID Essentials, innovations and beyond. 
SM Shiva Sukula 
E-Resources and Digital Services 
V.K.J. Jeevan | 


Gender Studies In Informatics 
M. Suriya, C. Sambandam, L.N. Umadevi, V. Ganesh, R. Vijay Arumugam 


No Shelf Required 
Sue Polanka 


Web Based Instruction 
Susan Sharpless Smith 


Ess Ess Publications 


4831/24, Ansari Road, Darya Ganj, New Delhi-110 002 
Phones : 41563444, 23260807 Fax 011-41563334 
E-mail : info@essessreference.com 


Wi 


&170"00 
Rs. 3950/- ove om 


www.essessreference.com 


